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Optimal Schedules for Parallelizing Anytime Algorithms: 
The Case of Shared Resources 



The performance of anytime algorithms can be improved by simultaneously solving 
several instances of algorithm-problem pairs. These pairs may include different instances 
of a problem (such as starting from a different initial state), different algorithms (if several 
alternatives exist), or several runs of the same algorithm (for non-deterministic algorithms). 
In this paper we present a methodology for designing an optimal scheduling policy based 
on the statistical characteristics of the algorithms involved. We formally analyze the case 
where the processes share resources (a single-processor model), and provide an algorithm 
for optimal scheduling. We analyze, theoretically and empirically, the behavior of our 
scheduling algorithm for various distribution types. Finally, we present empirical results of 
applying our scheduling algorithm to the Latin Square problem. 

1. Introduction 

Assume that our task is to learn a concept with a predefined success rate, measured on a 
given test set. Assume that we can use two alternative learning algorithms, one which learns 
fast but requires some preprocessing, and another which works more slowly but requires no 
preprocessing. Can we possibly benefit from using both learning algorithms in parallel to 
solve one learning task on a single-processor machine? 

Another area of application is that of constraint satisfaction problems. Assume that 
a student tries to decide between two elective courses by trying to schedule each of them 
with the set of her compulsory courses. Should the student try to solve the two sets of 
constraints sequentially or should the two computations be somehow interleaved? 

Assume now that a crawler searches for a specific page in a site. If we had more than one 
starting point, the process could be speeded up by simultaneous application of the crawler 
from a few (or all) of them. However, what would be the optimal strategy if the bandwidth 
were restricted? 

What do the above examples have in common? 

• There are potential benefits to be gained from the uncertainty in the amount of 
resources that will be required to solve more than one instance of the algorithm- 
problem pair. We can use different algorithms (in the first example) and different 
problems (in the last two examples). For non-deterministic algorithms, we can also 
use different runs of the same algorithm. 
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• Each process is executed with the purpose of satisfying a given goal predicate. The 
task is considered accomplished when one of the runs succeeds. 

• If the goal predicate is satisfied at time t*, then it is also satisfied at any time t > t*. 
This property is equivalent to utility monotonicity of anytime algorithms (Dean &; 
Boddy, 1988; Horvitz, 1987), where solution quality is restricted to Boolean values. 

Our objective is to provide a schedule that minimizes the expected cost, possibly under 
some constraints (for example, processes may share resources). Such problem definition is 
typical for rational-bounded reasoning (Simon, 1982; Russell & Wefald, 1991). This problem 
resembles those faced by contract algorithms (Russell & Zilberstein, 1991; Zilberstein, 1993). 
There, given the allocated resources, the task is to construct an algorithm providing a 
solution of the highest quality. In our case, given quality requirements, the task is to 
construct an algorithm that solves the problem using minimal resources. 

There are several research works that deal with similar problems. Simple parallelization, 
with no information exchange between the processes, may speed up the process due to 
high diversity in solution times. For example, Knight (1993) showed that using many 
reactive agents employing RTA* search (Korf, 1990) is more beneficial than using a single 
deliberative agent. Another example is the work of Yokoo and Kitamura (1996), who used 
several search agents in parallel, with agent rearrangement after preallotted periods of time. 
Janakiram, Agrawal, and Mehrotra (1988) showed that for many common distributions of 
solution time, simple parallelization leads to at most linear speedup. One exception is the 
family of heavy-tailed distributions (Gomes, Selman, & Kautz, 1998) for which it is possible 
to obtain superlinear speedup by simple parallelization. 

A superlinear speedup can also be obtained when we have access to the internal structure 
of the processes involved. For example, Clearwater, Hogg, and Huberman (1992) reported 
superlinear speedup for cryptarithmetic problems as a result of information exchange be- 
tween the processes. Another example is the works of Kumar and Rao (Rao & Kumar, 
1987; Kumar & Rao, 1987; Rao & Kumar, 1993), devoted to parallelizing standard search 
algorithms, where superlinear speedup is obtained by dividing the search space. 

An interesting domain-independent approach is based on "portfolio" construction (Hu- 
berman, Lukose, & Hogg, 1997; Gomes h, Selman, 1997). In this approach, a different 
amount of resources is allotted to each process. This can reduce both expected resource 
consumption and its variance. 

In the case of non-deterministic algorithms, another way to benefit from solution time 
diversity is to restart the same algorithm in attempt to switch to a better trajectory. Such 
a framework was analyzed in detail by Luby, Sinclair, and Zuckerman (1993) for the case of 
a single processor and by Luby and Ertel (1994) for the multiprocessor case. In particular, 
it was proven that for a single processor, the optimal strategy is to periodically restart the 
algorithm after a constant amount of time until the solution is found. This strategy was 
successfully applied to combinatorial search problems by Gomes, Selman, and Kautz (1998). 

There are several settings, however, where the restart strategy is not optimal. If the 
goal is to schedule a number of runs of a single non-deterministic algorithm, such that 
this number is limited due to the nature of the problem (for example, robotic search), 
the restart strategy is applicable but not optimal. A special case of the above settings is 
scheduling a number of runs of a deterministic algorithm with a finite set of available initial 
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configurations (inputs). Finally, the case where the goal is to schedule a set of algorithms 
different from each other is out of the scope of the restart strategy. 

The goal of this research is to develop a methodology for designing an optimal scheduling 
policy for any number of instances of algorithm-problem pairs, where the algorithms can 
be either deterministic or non-deterministic. We present a formal framework for schedul- 
ing parallel anytime algorithms for the case where the processes share resources (a single- 
processor model), based on the statistical characteristics of the algorithms involved. The 
framework assumes that we know the probability of the goal condition to be satisfied as a 
function of time (a performance profile (Simon, 1955; Boddy & Dean, 1994) restricted to 
Boolean quality values). We analyze the properties of optimal schedules for the suspend- 
resume model, where allocation of resources is performed on mutual exclusion basis, and 
show that in most cases an extension of the framework to intensity control, where resources 
may be allocated simultaneously and proportionately to multiple demands, does not yield 
better schedules. We also present an algorithm for building optimal schedules. Finally, we 
demonstrate experimental results for the optimal schedules. 

2. Motivation 

Before starting the formal discussion, we would like to illustrate how different scheduling 
strategies can affect the performance of a system of two search processes. The first example 
has a very simple setup which allows us to perform a full analysis. In the second example, 
we show quantitative results for a real CSP problem. 

2.1 Scheduling DFS Search Processes 

Assume DFS with random tie-breaking is applied to a simple search space shown in Figure 1, 
but that only two runs of the algorithm are allowed 1 . There is a very large number of 
paths to the goal, half of them of length 10, quarter of them of length 40, and quarter of 
them of length 160. When one of the processes finds the solution, the task is considered 
accomplished. 




Figure 1: A simple search task: two DFS-based agents search for a path from A to B. Scheduling 
the processes may reduce costs. 



1. Such a limit can follow, for example, from physical constraints, such as for the problem of robotic search. 
For unlimited number of runs the optimal results would be provided by the restart strategy. 
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We consider a single-processor system, where the two processes cannot run simultane- 
ously. Let us denote the processes by A\ and A2, and by L\ and L2 the actual path lengths 
for A\ and A2 respectively for the particular run. 

The application of a single processes (without loss of generality, A\) gives us the expected 
execution time of 1/2 x 10 + 1/4 x 40 + 1/4 x 160 = 55, as is shown in Figure 2. 




= 160, cost = 160 



Figure 2: Path lengths, probabilities and costs for running a single process 

We can improve the performance by simulating a simultaneous execution of two pro- 
cesses. For this purpose, we allow each of the processes to expand a single node, and to 
switch to the other process (without loss of generality, A\ starts first). In this case, the 
expected execution time is 1/2 x 19 + 1/4 x 20 + 1/8 x 79 + 1/16 x 80 + 1/16 x 319 = 49.3125, 
as is shown in Figure 3. 

Finally, if we know the distribution of path lengths, we can allow A\ to open 10 nodes; 
if A\ fails, we can stop it and allow A2 to open 10 nodes; if A2 fails as well, we can allow 
Ai to open the next 30 nodes, and so forth. In this scenario, A\ and A2 switch after 10 
and 40 nodes (if both processes fail to find a solution after 40 nodes, it is guaranteed to be 
found by A\ after 160 nodes). This scheme is shown in Figure 4, and the expected time is 
1/2 x 10 + 1/4 x 20 + 1/8 x 50 + 1/16 x 80 + 1/16 x 200 = 33.75. 

2.2 The Latin Square Example 

The task in the Latin Square problem is to place N symbols on an N x N square such 
that each symbol appears only once in each row and each column. An example is shown in 
Figure 5. 

A more interesting problem arises when the square is partially filled. The problem in 
this case may be solvable (see the left side of Figure 6) or unsolvable (see the right side 
of Figure 6). The problem of satisfiability of a partially filled Latin Square is a typical 
constraint-satisfaction problem. We consider a slight variation of this task. Let us assume 
that two partially filled squares are available, and we need to decide whether at least one of 
them is solvable. We assume that we are allocated a single processor. We attempt to speed 
up the time of finding a solution by starting to solve the two problems from two different 
initial configurations in parallel. 

Each of the processes employs a deterministic heuristic DFS with the First-Fail heuris- 
tic (Gomes &; Selman, 1997). We consider 10%-filled 20 x 20 Latin Squares. The behavior 
of a single process measured on a set of 50,000 randomly generated samples is shown in 
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Figure 3: Path lengths, probabilities and costs for simulating a simultaneous execution 
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Figure 4: Path lengths, probabilities and costs for the interleaved execution 
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Figure 5: An example of a 5 x 5 Latin Square. 



78 



Optimal Schedules for Parallelizing Anytime Algorithms 



1 












1 




? 


4 


5 




4 




1 
















5 




2 










2 
















1 






3 












1 


2 










4 







Figure 6: An example of solvable (to the left) and unsolvable (to the right) prcfilled 5x5 Latin 
Squares. 



Figure 7. Figure 7(a) shows the probability of finding a solution as a function of the number 
of search steps, and Figure 7(b) shows the corresponding distribution density. Assume that 





(a) 



(b) 



Figure 7: The behavior of DFS with the First-Fail heuristic on 10%-filled 20 x 20 Latin Squares. 

(a) The probability of finding a solution as a function of the number of search steps; 

(b) The corresponding distribution density. 



each run is limited to 25,000 search steps (only 88.6% of the problems are solvable under this 
condition). If we apply the algorithm only on one of the available two initial configurations, 
the average number of search steps is 3777. If we run two processes in parallel (alternating 
after each step), we obtain a result of 1358 steps. If we allow a single switch at the optimal 
point (an analogue of the restart technique (Luby et al., 1993; Gomes et al., 1998) for two 
processes), we get 1376 steps on average (the optimal point is after 1311 steps). Finally, if 
we interleave the processes, switching at the points corresponding to 679, 3072, and 10208 
of total steps, the average number of steps is 1177. The above results were averaged over a 
test set of 25,000 pairs of initial configurations. 

The last sequence of switch points is an optimal schedule for the process with behavior 
described by the graphs in Figure 7. In the rest of the paper we present an algorithm for 
deriving such optimal schedules. 
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3. A Framework for Parallelization Scheduling 

In this section we formalize the intuitive description of parallelization scheduling. The first 
part of this framework is similar to our framework presented in (Finkelstein & Markovitch, 
2001). 

Let S be a set of states, t be a time variable with non-negative real values, and A be a 
random process such that each realization (trajectory) A(t) of A represents a mapping from 
VA to S. Let Xq be a random variable defined over S. Since an algorithm Alg starting 
from an initial state So corresponds to a single trajectory (for deterministic algorithms), or 
to a set of trajectories with an associated distribution (for non-deterministic algorithms), 
the pair (Xq, Alg), where Xq stands for the initial state, can be viewed as a random process. 
Drawing a trajectory for such a process corresponds, without loss of generality, to a two- 
step procedure: first an initial state Sq is drawn for Xq, and then a trajectory A(t) starting 
from Sq is drawn for Alg. Thus, the source of randomness is either the randomness of the 
initial state, or the randomness of the algorithm (which can come from the algorithm itself 
or from the environment), or both. 

Let S* C S be a designated set of states, and G : S — > {0, 1} be the characteristic 
function of S* called the goal predicate. The behavior of a trajectory A(t) of A with respect 
to the goal predicate G can be written as G(A(t)), which we denote by Ga(0- We say 
that A is monotonic over G if and only if GU(i) is a non-decreasing function for each 
trajectory A(t) of A. Under the above assumptions GU(i) is a step function with at most 
one discontinuity point. 

Let A be monotonic over G. From the definitions above we can see that the behavior 
of G for each trajectory A(t) of A can be described by a single point tA,G, the first point 
after which the goal predicate is true, i.e, tA,G = inf t {i|GU(i) = 1}- If GU(i) is always 0, 
we say that tA,G 1S n °t defined. Therefore, we can define a random variable, which for each 
trajectory A(t) of A with Ia,g defined, corresponds to £a,g- The behavior of this variable 
can be described by its distribution function F(t). At the points where F(t) is differentiable, 
we use the probability density f(t) = F'[t). 

It is important to note that in practice not every trajectory of A leads to the goal 
predicate satisfaction even after infinitely large time. That means that the set of the tra- 
jectories where Ia,g is undefined is not necessarily of measure zero. That is why we define 
the probability of success p as the probability of A(t) to have Ia,g defined 2 . For the Latin 
Square example described in Section 2.2, the probability of success is 0.886, and the graphs 
in Figure 7 correspond to pF(t) and pf(t). 

Assume now that we have a system of n random processes Ai, ■ . ■ A n with correspond- 
ing distribution functions Fx, . . . ,F n and goal predicates G\, . . . , G n . If the distribution 
functions Fi and Fj are identical, we refer to Ai and Aj as F -equivalent. 

We define a schedule of the system as a set of binary functions {6i}, where at each 
moment t, the i-th process is active if 0j(t) = 1 and idle otherwise. We refer to this scheme 
as suspend-resume scheduling. A possible generalization of this framework is to extend 
the suspend/resume control to a more refined mechanism that allows us to determine the 

2. Another way to express the possibility that a process will not reach a goal state is to use F(t) that 
approach 1 — p when t — > oo. We prefer to use p explicitly because the distribution function must meet 
the requirement linii^oo F(t) = 1. 
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intensity with which each process acts. For software processes, this means varying the 
fraction of CPU utilization; for tasks like robot navigation this implies changing the speed 
of the robots. Mathematically, using intensity control is equivalent to replacing the binary 
functions 0j(t) with continuous functions with a range between zero and one 3 . 

Note that scheduling makes the term time ambiguous. On one hand, we have the 
subjective time for each process, consumed only when the process is active. This kind of 
time corresponds to some resource consumed by the process. On the other hand, we have 
an objective time measured from the point of view of an external observer. The distribution 
function of each process is defined over its subjective time, while the cost function (see 
below) may use both kinds of times. Since we are using several processes, all the formulas 
in this paper are based on the objective time. 

Let us denote by <Ji(t) the total time that process % has been active before t. By 
definition, 

<7i(t)= I 0i(x)dx. (1) 
J 

In practice <Tj(i) provides the mapping from the objective time t to the subjective time of 
the i-th process, and we refer to these functions as subjective schedule functions. Since 9{ 
can be obtained from <jj by differentiation, we often describe schedules by {uj} instead of 
{('■}■ 

The processes {Ai} with goal predicates {Gi} running under schedules {ai} result in 
a new process A, with a goal predicate G. G is the disjunction of Gi (G(t) = \J i Gi(t)), 
and therefore A is monotonic over G. We denote the distribution function of the corre- 
sponding random variable by F n (t, a±, . . . , a n ), and the corresponding distribution density 
by f n (t,<Ti, ■ ■ ■ ,cr n ). 

Assume that we are given a monotonic non-decreasing cost function u(t, ti, . . . , t n ), which 
depends on the objective time t and the subjective times per process t{. We also assume 
that u(0,ti, . . . ,t n ) = 0. Since the subjective times can be calculated by <Jj(t), we actually 
have u = u(t, a±(t), . . . , a n (t)). 

The expected cost of schedule {ai} can be expressed, therefore, as 4 

r+oo 

E u (a u . . . ,a n ) = / u(t,a 1 ,...,a n )f n (t,a 1 ,...,a n )dt (2) 
J o 

(for the sake of readability, we omit t in o~i(t)). Under the suspend-resume model assump- 
tions, cjj must be differentiable (except for a countable set of process switch points) and 
have derivatives of or 1 that would ensure correct values for d. L . Under intensity control 
assumptions, the derivatives of o~i must lie between and 1. 

We consider two alternative setups for resource sharing between the processes: 

1. The processes share resources on a mutual exclusion basis. That means that exactly 
one process can be active at each moment, and the processes will be active one after 
another until the goal is reached by one of them. In this case the sum of derivatives 

3. A special case of such a setup using constant intensities was described by Huberman, Lukose, and 
Hogg (1997). 

4. The generalization to the case where the probability of success p is not 1 is considered at the end of the 
next section. 
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of <7j is always one 5 . The case of shared resources corresponds to the case of several 
processes running on a single processor. 

2. The processes are fully independent: there are no additional constraints on <jj. This 
case corresponds to n independent processes running on n processors. 

Our goal is to find a schedule which minimizes the expected cost (2) under the corresponding 
constraints. The current paper is devoted to the case of shared processes. The case of 
independent resources was studied in (Finkelstein, Markovitch, & Rivlin, 2002). 

The scheduled algorithms considered in this framework can be viewed as anytime algo- 
rithms. The behavior of anytime algorithms is usually characterized by their performance 
profile - the expected quality of the algorithm output as a function of the alloted resources. 
The goal predicate G can be viewed as a quality function with two possible values, and thus 
the distribution function F(t) meets the definition of performance profile, where time plays 
the role of resource. 

4. Suspend-Resume Based Scheduling 

In this section we consider the case of suspend-resume based control (crj are continuous 
functions with derivatives or 1). 

Claim 1 The expressions for the goal-time distribution F n {t,a\, . . . ,a n ) and the expected 
cost E u (a±, . . . , a n ) are as follows 6 : 

n 

F n (t,<7i,...,<r n ) = 1- Y[(l-Fi(<Ti)), (3) 

i=i 

r+OO / n \ n 

E u (a 1 ,...,a n )= / U + XX< jUil-Fiia^dt. (4) 

Jo V i=i / i=i 

Proof: Let U be the time it would take the i-th process to meet the goal if acted alone 
(if the process fails to reach the goal, we consider ti = oo). Let t* be the time it takes 
the system of n processes to reach the goal. In this case, t* is distributed according to 
F n (t, 01, . . . , cr n ), and U are distributed according to Fi(t). Thus, because the processes, 
given a schedule, are independent, we obtain 

F n (t, a 1 ,...,a n ) = P{t* < t) = 1 - P(t* >t) = l- P(h > x ... x P(t n > a n (t)) = 

n 

1 - (1 - Fi(<7i(t))) x ... x (1 - F n (a n (t))) = 1 - - F t (a t (t))), 

i=i 

which corresponds to (3). Since F(t) is a distribution over time, we assume F(t) = for 
t < 0. 

5. This fact is obvious for the case of suspend-resume control, and for intensity control it is reflected in 
Lemma 3. 

6. u' t and u' CT . stand for partial derivatives of u by t and by ct; respectively. 
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The average cost function will therefore be 

r+oo 

E u (ai, . . . ,a n ) = / u(t,ai, . . . ,a n )f n (t,ai, . . . ,a n )dt = 
Jo 

r+oo 

- / u(t,(Ti,...,<T n )d(l- F n (t, (T n )) = 
JO 

- u(t, (Ti, . . . ,0(1 - F n (t, ai, ... , a n ))|~ + / + °° dn(t,(Jl :;" ,(Jw) TT(1 - Fi(ai))dt. 

Jo dt , =i 

Since u(0, a±, . . . , cr n ) = and -F n (oo ; ci, • • • , o"n) = 1> the first term in the last expression 
is 0. Besides, since the full derivative of u by t can be written as 

du(t,ai, ... ,a n ) / , / / 
^ = u * + Z> 



i=l 



we obtain 

E u (ai, ... ,<J n ) 



r+OO / n \ n 

1,0 V i=i / i=i 



which completes the proof. 
Q.E.D. 

Note that in the case of cJj(t) = t and -Fj(t) = F(t) for all z (parallel application of n 
F-equivalent processes), we obtain the formula presented in (Janakiram et al., 1988), i.e., 
F n (t) = 1 - (1 - F{t)) n . 

In the rest of this section we show a formal solution (necessary conditions and an algo- 
rithm) for the framework with shared resources. We start with two processes and present 
the formulas and the algorithm, and then generalize the solution for an arbitrary number 
of processes. For the case of two processes, we only assume that u is differentiable. 

For the more elaborated setup of n processes, we assume that the total cost is a linear 
combination of the objective time and all the subjective times, and the subjective times are 
of the same weight: 

n 

u(t,(Ti, ...,a n ) = at + (5) 

i=i 

Since time is consumed if and only if there is an active process, and the trivial case where 
all the processes are idle may be ignored, we obtain (without loss of generality) 



£ u (<7i,...,(7 n ) = / TT(1 - Fj((Tj))dt -> min. 

^0 i= l 



(6) 



This assumption is made to keep the expressions more readable. The solution process 
remains the same for the general form of u. 

4.1 Necessary Conditions for an Optimal Solution for Two Processes 

Let A\ and A2 be two processes sharing a resource. While working, one process locks 
the resource, and the other is necessarily idle. We can show that such dependency yields 
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a strong constraint on the behavior of the process, allowing the building of an effective 
algorithm for solving the minimization problem. 

For the suspend-resume model, therefore, only two states of the system are possible: 
A 1 is active and A 2 is idle (Si); and Ai is idle and A 2 is active (S 2 )- We ignore the 
case where both processes are idle, since removing such a state from the schedule will not 
increase the cost. Therefore, the system continuously alternates between the two states: 
Si — ► 52 — ► Si — > 52 — > . . .. We call the time interval corresponding to each pair {Si, S 2 ) 
a phase and denote phase k by If we denote the process switch points by tj, the phase 
<I>fc corresponds to [t 2 k-2,t 2 k\- See Figure 8 for an illustration. 
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• 

Si 
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Figure 8: Notations for times, states and phases for two processes 



By this scheme, Ai is active in the intervals [io,ti], [£2^3] > ••• , fefcj^fc+iL ••• > an d 
A 2 is active in the intervals ^1^2], ^3^4], • • • , [*2fc+i>*2fc+2]> 

Let us denote by fak-i the total time that Ai has been active before t 2k -i, and by 
( 2 k the total time that A 2 has been active before t 2 k- By phase definition, ( 2 k-i an d (2k 
correspond to the cumulative time spent in phases 1 to k in states Si and 52 respectively. 
There exists a one-to-one correspondence between the sequences d and U: 

Ci + Ci+1 = U+l- (7) 

Moreover, by definition of Q we have 

fl(*2fc-l) = Cl(*2fc) = C2fc-li ^ 
0"2(*2fe) = ^(^fc+l) = C2fc- 

Under the process switch scheme as defined above, the subjective schedule functions o\ 
and o"2 in time intervals [t 2 k,t 2 k+i] (state 5i of phase $fc+i) have the form 

<7i(t) = i - t 2k + ai(t 2 k) = t - t 2k + Q 2 k-i = t - ( 2 k, ^ 
<?2(t) = a 2 (t 2 k) = (2k- 

Similarly, in the intervals [t 2 k+i,t 2 k+ 2 ] (state 52 of phase $fc+i), the subjective schedule 
functions are defined as 

<7l(t) = <Tl(*2fc+l) = C2fe+1) ^ 
a 2 (t) = t - t 2k+ l + (T 2 (t 2k+ l) = t — t 2 k+l + (2k = t - C2fc+1- 

Let us denote 

v(h,t 2 ) = u' t (ti + t 2 ,h,t 2 ) + u' ai (ti +t 2 ,ti,t 2 ) +u' a2 (ti +t 2 ,ti,t 2 ) 
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and 

Vi(h,t 2 ) = u' t (ti + t 2 ,h,t 2 ) + u' a .(ti + t 2 ,h,t 2 ). 

To provide an optimal solution for the suspend/resume model, we may split (4) to phases 
$fc and write it as 

00 rtik 

E u (a u ...,a n ) = J2 v(a 1 ,a 2 ){l-F 1 (a 1 ))(l-F 2 {a 2 ))dt. (11) 



k=l 



The last expression may be rewritten as 

E u (a±, ...,(J n ) = 
00 rt 



v(a 1 ,a 2 )(l-F 1 (a 1 ))(l-F 2 (a 2 ))dt+ 

7- r\ J to h 



k=0" t2k ( 12 ) 
J2 v(a 1 ,a 2 )(l-F 1 (a 1 ))(l-F 2 (a 2 ))dt. 

1 r\ J 



Using (9) on interval [t 2 k,t 2 k+i], performing substitution x = t — Q 2k , and using (7), we 
obtain 



/*^2fc + l 

/ «(<71,<7 2 )(1 - *i(<ri))(l " F 2 (a 2 ))dt = 

/ - C2fc,C2fc)(l - F^t - C 2fc ))(l - F 2 (£ 2k ))dt = 

P 1 2 k + 1 — C2 k 

/ ' v 1 (x,(2k)(l-F 1 (x))(l-F 2 (( 2k ))dx = 

/ ui(x,C2fc)(l " Fi(x))(l - F 2 (( 2k ))dx. 

Similarly, for the interval [t 2 k+i,t 2k + 2 ] we have 

/ v(a 1 ,a 2 )(l-F 1 (a 1 ))(l-F 2 (a 2 ))dt = 

"^2fc+l 
/*t2fc + 2 

/ t'2(C2fc+l,t-C2fc+l)(l-^l(C2fc+l))(l-i ? 2(t-C2fc+l))^ 



(13) 



^2fc+l 
/*^2fc + 2 — C2/C + I 

/ " i; 2 (C2fe+i,a;)(l-i ? i(C2A ; +i))(l-i ? 2(a:))^ = 

/* C2 fc + 2 

/ ' ^2(C2fc + l,x)(l-F 1 (C2A ; +l))(l-i ? 2(x))^. 



(14) 
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Substituting (13) and (14) into (12), we obtain a new form for the minimization problem: 

E u ((l, ... ,( n ) = 



E 

k=0 



1 - F 2 {t 2k )) / Vl (x, C 2fc )(l - F x {x))dx + 
(l-Fi(C 2 fe+i)) / ' «2(C 2 fc+i,a:)(l-F2(x))dx 

Cot 



(15) 



mm 



(for the sake of generality, we assume £_i = 0). 

The minimization problem (15) is equivalent to the original problem (4), and the depen- 
dency between their solutions is described by (9) and (10). The only constraint for the new 
problem follows from the fact that the processes are alternating for non-negative periods of 
time: 

C0 = 0<C2<...<C2n<... nm 
Cl < C3 < • • • < C2n+1 < ■ • ■ [ ' 

The expression (15) reaches its optimal values either when 

dE, 



Ofor k = l,...,n,..., (17) 



or on the border described by (16). However, for two processes we can, without loss of 
generality, ignore the border case. Indeed, assume that Q = Ci+2 for some i > 1 (one of the 
processes skips its turn). We can construct a new schedule by removing d + \ and Ci+2: 

Cl) • • • ) Ci-l) Ci> Ci+3) Ci+4) Ci+5> • • • 

It is easy to see that the process described by this schedule is exactly the same process as 
described by the original one, but the singularity point has been removed. 

Thus, at each step the time spent by the processes is determined by (17). We can see 
that (2k appears in three subsequent terms of E u (ai, . . . , a n ): 



K2k 

... + (l-iq(C 2fe -i)) / V 2 (C2k-l,x)(l- F 2 (x))dx+ 

(l-^2(C2fe)) / V 1 {x,(2k)(l-F 1 ( X ))dx+ 

f C2 k + 2 

(1 - i ? i(C 2 fc+i)) / v 2 (Q 2k+l ,x){\ - F 2 (x))dx + .... 

J C2k 
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Differentiating (15) by ( 2k , therefore, yields 



«2(C2fc-l, C2fc)(l " i ? l(C2fc-l))(l " F 2 {C 2 k))- 
"S2fc 

/ 2 (C2fe) / Wl(x,C2fc)(l-i ? l(^))^+ 
f&k+i Q v , 

(l-F 2 (Q 2k )) / ^(^^^(l-F^x))^- 

U2(C2fc+l,C2fc)(l " i ? l(C2fc+l))(l " F 2 (( 2k )) = 

(1 " F 2 (C 2k ))(v 2 (C 2k - 1 ,C2k)(l - Fl(C2fc-l)) " «2(C2fc+l,C2fc)(l " ^l(C2fc+l))- 

/ 2 (C2fc) / ^l(x,C 2fc )(l-Fl(x))(ix+ 
"' C2fe-1 

(i-F 2 (C2 fc )) / ^(*,C 2 /0(i-^i(*))^- 

A similar expression can be derived by differentiating (15) by C 2 fc+i- Combining these 
expressions with (17) gives us the following theorem: 

Theorem 1 (The chain theorem for two processes) 

The value for (j+i for i > 2 can be computed for given Q-i and Q using the formulas 

/ 2 (C 2 fc) _U2(C2fc-l,C2fc)(l - *l(C2k-l)) - U2(C2fc+l,C2fc)(l ~ ^(CU'+l)) 



1 " *MC 2 fc) Vl (x, C 2fc )(l - Fx(x))dx 

fc^^c.^a-Fi^x ^ =2fc+i (18) 

/i(C 2 fc+i) _^i(C 2 fc, C 2 fe+i)(l - F 2 (( 2k )) - vi(Q 2k+2 , C2fc+i)(l - F 2 (C, 2 k+2)) , 



1 - F 1 (( 2k+1 ) t , 2 ( C2fc+1)X) (i _ F 2 (x))dx 

/ c t fc fc+2 tf(C 2fc+ i^)(i-^W)^ 



(19) 



£ fe+2 ^ 2 (C 2fc+ i,x)(l-F 2 (x))^ 



2/c + 2. 



Corollary 1 For the linear cost function ( 5), the value for Q + i for i > 2 can be computed 
for given and Q using the formulas 

/ 2 (C 2 fc) _ -Pi(C2fc+i) - -Pl(C2fc-l) „-_2A; + l (20) 



l-^ 2 (C 2 fc) /^(l-Fi^dx ' 

/i(C 2 fc+i) _ -P 2 (C 2 fc+ 2 ) - F 2 (( 2k ) 
1 - Fi(C2fc+i) ~ j£ a * +2 (l - F 2 {x))dx 



i = 2k + 2. (21) 



The proof follows immediately from the fact that Vi(t±, t 2 ) = a + b. 

Theorem 1 allows us to formulate an algorithm for building an optimal solution. This 
algorithm is presented in the next subsection. 
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4.2 Optimal Solution for Two Processes: an Algorithm 



The goal of the scheduling algorithm is to minimize the expression (15) 




(1 - Fi(C 2 fc+i)) / v 2 {(2k+i,x)(l - F 2 {x))dx -^min 

J (2k 



under the constraints 



f C0 = 0<C2<...<C2n<... 
1 Cl < C3 < • • • < C2n+1 < • • • • 



Assume that A\ acts first (Ci > 0). From Theorem 1 we can see that the values of 
Co = and £i determine the set of possible values for ("2, the values of Ci and C2 determine 
the possible values for £3, and so on. 

Therefore, a non-zero value for (1 provides us with a tree of possible values of £fc- The 
branching factor of this tree is determined by the number of roots of (18) and (19). Each 
possible sequence (1,(2, - ■ ■ can be evaluated using (15). 

For the cases where the total time is limited as discussed in Section 4.5, or where the 
series in that expression converge, e.g., when each process has a finite cost of finding a 
solution, the algorithm stops after a finite number of points. In some cases, however, such 
as for extremely heavy-tailed distributions, it is possible that the above series diverge. To 
ensure a finite number of iterations in such cases, we set an upper limit on the maximal 
expected cost. 

Another limit is added for the probability of failure. Since U = Ci-i +C»> the probability 
that both runs would not be able to find a solution after h is 



becomes small enough, we can conclude that both runs failed to find a solution and stop 
the execution. 

For each value of £1 we can find the best sequence using one of the standard search 
algorithms, such as Branch-and-Bound. Let us denote the value of the best sequence for 
each Ci by E u (d). Performing global optimization of E u (d) by Ci provides us with an 
optimal solution for the case where A\ acts first. Note that the value of £1 may also be 
(A2 acts first), so we need to compare the value obtained by optimization of (1 with the 
value obtained by optimization of C2 where £i = 0. 

The flow of the algorithm is illustrated in Figure 9, the formal scheme is presented in 
Figure 10, and the description of the main routine (realized by the DFS Branch and Bound 
method) in Figure 11. 



(l-F 1 (C,-i))(l 



Therefore, if the difference 



(l-F 1 (C l _i))(l-F 2 (0)) 
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The algorithm considers two main branches, one for A\ and one for A 2 , and they are 
processed by procedure minimizesequenceJby -first jpoint (Figure 10). At each step, we 
initialize the array of £ values, and pass it, through the procedure build-optimal_sequence, 
to the recursive procedure dfsbnb, which represents the core of the algorithm (Figure 11). 

The dfsbnb procedure, shown in Figure 11, acts as follows. It obtains as an input the 
array of £ values, the cost involved up to the current moment, and the best value reached till 
now. If the cost exceeds this value, the procedure performs a classical Branch-and-Bound 
cutoff (lines 1-2). 

The inner loop (lines 4-19) corresponds to different roots of the expressions (18) 
and (19). The new value of ( corresponding to £fc is calculated by the procedure 
calculate jnextjzeta (line 5), and it cannot exceed the previously found root saved in 
last_zeta (for the first iteration, last-zeta is initialized to Cfc— 2)5 lines 3 and 8. Lines 6- 
7 correspond to the case where the lower bound passed to calculate_next-zeta exceeds the 
maximal available time, and in this case the procedure is stopped. 

After the new possible value of £ is found, the procedure updates the current cost (line 
9), and the stopping criteria mentioned above are validated for the new array of £ values, 
which is denoted as a concatenation of the old array and the new value of ( (line 10). If 
the task is accomplished, the cost is verified versus the best known value (which is updated 
if necessary), and the procedure returns (lines 10-16). Otherwise, £ is temporarily added 
to the array of £, and the Branch-and-Bound procedure is called recursively for calculation 

Cfe+i- 

When the whole tree is traversed (except the cutoffs), the best known cost is returned 
(line 20). The corresponding array of £ is the required solution. 

Figure 13 shows a trace of a single Branch-and-Bound run for the example shown in 
Section 2.2 starting with the optimal value of Ci- The optimal schedule derived from the the 
run is 679, 2393, 7815, 17184 with expected cost of 1216.49 steps. The scheduling points are 
given in subjective times. Using objective (total) time the schedule can be written as 679, 
3072, 10208, and 25000. In this particular run there were no Branch-and-Bound cutoffs due 
to the small number of roots of (18) and (19). 

4.3 Necessary Conditions for an Optimal Solution for n Processes 

In this section we generalize our solution from the case of two processes to the case of n 
processes. 

Assume that we have n processes A\,..., A n using shared resources. One of the possible 
ways to present a schedule is to use a sequence 

((Ai^Ah), (A i2 ,At 2 ), (Ai.,Atj), . . .), 

where A^. is the j-th active process, and Atj is the time allocated for this invocation of A4. . 

To simplify the formalization of the problem, however, we use the following alternative 
representation. First, we allow Atj to be 0, which makes possible it to represent every 
schedule as 

<(Ai, Ati), {A 2 , At 2 ), . . . , {A n , At n ), (A u At n+1 ), {A 2 , At n+2 ), . . . , {A n , At 2n ), ...}. 
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A\ acts first 






Minimization by Ci 







Get optimal schedule costs for 
Ci = and for d 7^ 0, and 
return the best value with 
responding schedule 





£1 trials by minimization procedure 

Root of Branch and Bound tree 
(2 satisfying (19) for k = 
Branch and Bound nodes 
£3 satisfying (18) for k = 1 
Branch and Bound nodes 
C4 satisfying (19) for k — 1 



O Branch and Bound non-leaf nodes 

• Leaf nodes (terminating condition satisfied) and cutoff nodes (expected result is 
worse than the already known). The cost is calculated in accordance with (15). 



Figure 9: The flow of the algorithm for constructing optimal schedules for 2 processes 
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procedure optimize 

Input: Fi(t),F2(i) (performance profiles). 

Output: An optimal sequence and its value. 

[sequencei, val±] <— minimizesequenceJ>y-firstjpoint(Ai) 

[sequence2,val2] <— minimizesequenceJby -first jpoint(A2) 

if val\ < val2 then 

return [sequencei, val±] 
else 

return [sequence2, v a^] 
end 
end 

procedure minimizesequenceJ)y_firstjpoint(process) 
zetas[— 1] <— 
zetas[0] <- 
if process = A2 then 

ze£as[l] <— 
end 

Using one of the standard minimization methods, find zetas, 
minimizing the value of the function build-optimalsequence(zetas) , 
and the corresponding cost, 
end 



Figure 10: Procedure optimize builds an optimal sequence for the case when A\ starts, an opti- 
mal sequence for the case when A 2 starts, compares the results, and returns the best 
one. Procedure minimizesequencejby_first_point returns an optimal sequence and its 
value. 
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procedure build-optimalsequence(zetas) 

curr_cost <— calculate-cost(zetas) 

return dfsbnb(zetas, curr-cost, M AX J/ ALU E) 
end 

procedure dfsbnb(zetas, curr-cost, thresh) 



1: if (curr-cost > thresh) then 

2: return MAX .VALUE //Cutoff 

3: last-value <— zetas[length(zetas) — 2] // The previous time value 

4: repeat 

5: C calculate jnext-zeta(zetas, last-value) 

6: if (C = last_value) then / / Skip 

7: return thresh 

8: last-value <— £ 

9: delta-cost <— calculate jpartialjcost{zetas , £) 

10: if (task-accomplished([zetas \\ £])) then // Leaf 

11: if (curr_cost + delta-cost < thresh) then 

12: optimal -zetas <— [zetas || £] 

13: thresh <— currjcost + delta-cost 

14: end 

15: return thresh 

16: end 

17: tmpjresult <— df sbnb([zetas \ \ £], currjzost + delta jcost, thresh) 
18: thresh = min(thresh, tmpjresult) 

19: end // repeat 

20: return thresh 



end 



Figure 11: Procedure build -optimal sequence, given the prehx of the time sequence, restores the 
optimal sequence with this prefix using the DFS Branch and Bound search algorithm, 
and returns the sequence itself and its value, [a; 1 1 y] stands for concatenation x and y. 
Auxiliary functions are shown in Figure 12. 
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1. calculate_cost(zetas) computes the cost of the sequence (or its part) in accordance 
with (15), 

2. calculate .partial jcost(zetas , Q computes the additional cost obtained by adding 
C to the sequence, 

3. calculate-next jzeta(zetas, lastjvalue) uses (18) or (19) to calculate the value of 
the next £ that is greater than lastjvalue. If no such a solution exists, the 
maximal time value is returned, 

4. task-accomplished(zetas) returns true when the task may be considered to be 
accomplished (e.g., either maximal possible time is over, or the probability of 
error is negligible, or the upper limit on the cost is exceeded). 

Figure 12: Auxiliary functions used in the optimal schedule algorithm 



^1 679.0 



^ 2 379.4 



^3 24620.6 
u=2664.54 



^ 4 17184.6 

u=1216.49 

Figure 13: A trace of a single run of the Branch-and-Bound procedure starting with the optimal 
value of Ci- 
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Therefore, the system alternates between n states S± — > 52 — ► . . . — > S n — ► Si — ► . . ., where 
the state corresponds to the situation where is active and the rest of the processes 
are idle. The time spent in the k-th invocation of Si is Atk n +i- 

As in the case of two processes, we call the time interval corresponding to the sequence 
of states Si — > S2 — > ■ . . — > S n a phase and denote phase fe by <&k- We denote the process 
switch points of by t\, t k , . . . , t k , where 

fc-i 

3=0 

Process Ai is active in phase k in the interval [<f — 1 , i^.] , and the entire phase lasts from t k 
to t k . The corresponding scheme is shown in Figure 14. 



>n—l 


''fc-1 — t fc i k L k 




j.n— 1 


f» — /° f 1 
L k — L k+l '•fe+1 






• • — 

Si S2 




— • 


Si 

— $fc+l 



Figure 14: Notations for times, states and phases for n processes 



To simplify the following discussion, we would like to allow indices i in t\. to be less than 
or greater than n. For this purpose, we denote 

A A mod n foo\ 

l k - l k+\i/n\ ' 

and the index of the process active in the interval [fjT 1 , t\] we denote by #i. For i mod n / 
we obtain #i = i mod n, while for i mod n = we have #i = n. Notation (22) claims 
that the shift by n in the upper index is equivalent to the shift by 1 in the phase number: 

A+n A 

l k - Z k+V 

As in the case of two processes, we denote by (1 the total time that A#i has been active 
up to t\. Cfc corresponds to the cumulative time spent in phases 1 to k in state S#i, and 
there is a one-to-one correspondence between the sequences of CJk and t l k : 

CI - Cfe-i = tfc ~ z \ 1 ) ( 23 ) 

n-1 



j=0 



l_J - t\ for i > n. (24) 



The first equation corresponds to the fact that the time between t l k ~ l and t k is accumulated 
into the ( values of process ^4#j, while the second equation claims that at each switch the 
objective time of the system is equal to the sum of the subjective times of each process. For 
the sake of uniformity we also denote 

di = ... = Ci = Co° = o. 
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By construction of Q. we can see, that at time interval [f k 1 ,t\] the subjective time of process 
Aj has the following form: 



( C 3 

Shi 



, Ci-v j = i + l,...,n. 



(25) 



The subjective time functions for a system with 3 processes are illustrated in Figure 15. 




Figure 15: Subjective time functions for a system with 3 processes 



To find an optimal schedule for a system with n processes, we need to minimize the 
expression given by (6). The only constraints are the monotonicity of the sequence of £ for 
each process i: 

Cfc — C'k+i for each k, i. (26) 
Given the expressions for aj, we can prove the following lemma: 

Lemma 1 For a system of n processes, the expression for the expected cost (6) can be 
rewritten as 



oo n i+n—1 



k=0 i=l j=i+l 



E u (Ci,...,Cn,...) = EE II (i-^-(cLi)) r {i-F t ( X ))dx. 

•—" -- 1 -— ■ 1 1 J Q-i 



(27) 



The proof is given in Appendix A.l. 

This lemma makes it possible to prove the chain theorem for an arbitrary number of 
processes: 
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Theorem 2 (The chain theorem) The value for C m +i ma V either be Cm 1 ; or can be 
computed given the previous 2n — 2 values of ( using the formula 



l+n-l l+n-l 

n a -*#*(&-!))- n ( i - f #3(cd) 

//(Cm) 3=1+1 j=l+l 



(28) 



l-^/(Cm) ^ . fC m+1 

£ II (l-F#ji(D) I (l-F #i (x))dx 

i=l- n +l j=i+l 
#3+1 

The proof of the theorem is given in Appendix A. 2. 

4.4 Optimal Solution for n Processes: an Algorithm 

The goal of the presented algorithm is to minimize the expression (27) 

oo n i+n—1 „£i 

£ M (Ci,...,Cn,...) = EE II o-- F #M-i)) i {^-F^dx 

k=0 i=l j=i+l J Ck-i 

under the constraints 

Cfc < Cfc+i for each fe > *■ 

As in the case of two processes, assume that A\ acts first. By Theorem 2, given 2n — 2 
values of ( 

aO /■! aii a1 a2 3 

Si 1 Si j • • • ) Si j S2' S2 > S2 ' 

we can determine all the possibilities for the value of C2 ~ 2 (either C™ 2 if the process skips 
its turn, or one of the roots of (28)). Given the values up to C2~ 2 > we can determine the 
values for , and so on. 

The idea of the algorithm is similar to the algorithm for two processes. The first 2n — 2 
variables (including Ci = 0) determine the tree of possible values for C- Optimization over 
2n — 3 first variables, therefore, provides us with an optimal schedule (as before, we compare 
the results for the case where the first k < n variables are 0). The only difference from the 
case of two processes is that a process may skip its turn. However, we can ignore the case 
when all the processes skip their turn, since we can remove such a loop from the schedule. 
The scheme of the algorithm is presented in Figure 16, and the description of the main 
routine (realized by the DFS Branch and Bound method) is presented in Figure 17. 



4.5 Optimal Solution in the Case of Additional Constraints 

Assume now that the problem has additional constraints: the solution time is limited by T 
and the probability of success of the i-th process pi is not necessarily 1 . 

It is possible to show that the expressions for the distribution function and the expected 
cost have almost the same form as in the regular framework: 

Claim 2 Let the system solution time be limited by T, and letpi be the probability of success 
for the i-th process. Then the expressions for the goal-time distribution and expected cost 
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Procedure optimize builds n optimal schedules (each process may start first), compares 
the results, and returns the best one 

procedure optimize 

best.val <- MAX .VALUE 

best-sequence <— 

loop for % from 1 to n do 

[sequence, val] <— minimize-sequenceJby-firstjpoints(i) 
if (val < best_val) then 

bestjval <— val 

best_sequence <— sequence 

end 

return [best. sequence, bestjual] 
end 

// Procedure minimizesequenceJby _/ 'ir stjpoints gets as a parameter 
/ / the index of a process which starts, and returns an optimal 
/ / sequence and its value 

procedure minimizesequenceJyy -first jpoints(processJtostart) 
loop for i from to n — 1 

zetas[— i] <— 
end 

loop for i from 1 to process _to start — 1 

zetas[i] <— 
end 

Using one of the standard minimization methods, find zetas, 
minimizing the value of the function build-optimaLsequence(zetas) . 
end 



Figure 16: An algorithm for finding an optimal schedule for n processes. The result contains the 
vector of Q, such that Ci = Co = Cl/nf 
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procedure build jjptimalsequence(zetas) 

currjcost <— calculatejcost(zetas) 

return df sbnb(zetas , currjcost, M AX A/ ALU E, 0) 
end 

procedure df sbnb(zetas , currjcost, thresh, nskip) 
if {currjcost > thresh) then 

return MAX .VALUE // Cutoff 

last-value <— zetas[length{zetas) — n] / / The previous time value for the current process 
repeat 

£ <— calculate jnext-zeta{zetas, lastjvalue) 
if (C = lastjvalue) then / / Skip 

break loop 

lastjvalue <— £ 

deltajcost <— calculate ..partial jcost{zetas , () 
if (task-accomplished([zetas || £])) then // Leaf 

if {currjcost + deltajcost < thresh) then 
optimal -zetas <— [zetas 1 1 C] 
thresh <— currjcost + deltajcost 
end 

break loop 
end 

tmpjresult <— df sbnb([zetas \ | £], currjcost + deltajcost, thresh, 0) 
thresh = minithresh, tmpjresult) 
end / / repeat 

if [nskip < n — 1) then / / Skip is possible 

ze£a <— zetas[length(zetas) — n] 

tmpjresult <— df sbnb([zetas \ \ (], currjcost, thresh, nskip + 1) 
thresh = min{thresh, tmpjresult) 
end 

return thresh 
end 



Figure 17: Procedure build -optimal sequence, given the prefix of time sequence, restores the opti- 
mal sequence with this prefix using the DFS Branch and Bound search algorithm, and 
returns the sequence itself and its value, [a; 1 1 y] stands for concatenation a; and y. The 
auxiliary functions used are similar to their counterparts in Figure 12, but deal with n 
processes instead of 2. 
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are as follows: 



n 



F n (t,o-i, 




'[[(1 - Pl F t (a t )) (fort<T), 



(29) 



i=l 



E u {o~i 



, ■ ■ ■ , <r n ) = / [u't + Yl a 'i u °i 




i=i 



n 



(30) 



The proof is similar to the proof of Claim 1. 

This claim shows that all the formulas used in the previous sections are valid for the 
current settings, with three differences: 

1. We use pjFj instead of Fj and pjfj instead of fj. 

2. All the integrals are from to T instead of from to oo. 

3. All time variables are limited by T. 

The first two conditions may be easily incorporated into all the algorithms. The last con- 
dition implies additional changes in the chain theorems and the algorithms. The chain 
theorem for n processes now becomes: 

Theorem 3 The value for ^ can either be Ci-i> or ^ can be computed given the previous 
In — 2 values of Q using formula (28), or it can be calculated by the formula 



The first two alternatives are similar to Theorem 2, while the third one corresponds to the 
boundary condition given by Equation (24). This third alternative adds one more branch 
to the DFS Branch and Bound algorithm; the rest of the algorithm remains unchanged. 

Similar changes in the algorithms are performed in the case of the maximal allowed 
time Tj per process. In practice, we always use this limitation, setting Tj such that the 
probability for A; L to reach the goal after Tj, pi(l — Fj(Tj)), becomes negligible. 

5. Process Scheduling by Intensity Control 

In this section we analyze the problem of optimal scheduling for the case of intensity con- 
trol, which is equivalent to replacing the binary scheduling functions 6i(t) with continuous 
functions with a range between and 1. In this paper we assume a linear cost function of 
the form (5). We believe, however, that similar analysis is applicable to the setup with any 
differ entiable u. 

It is easy to see that all the formulas for the distribution function and the expected cost 
from Claim 1 are still valid under intensity control settings. 

For the linear cost function (5), the minimization problem has the form 



n-l 




(31) 



l=i 




(32) 
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Without loss of generality, we can assume a + b = 1. This leads to the equivalent minimiza- 
tion problem 

f-oo / n \ n 

E u (a 1 ,...,a n ) = I (1- c)+cJ2°i) Y[(l-Fj(°j))dt ^ mm, (33) 
Jo V i=l J j=l 

where c = b/(a + b) can be viewed as a normalized resource weight. The constraints, 
however, are more complicated than for the suspend/resume model: 

1. As before, Oi must be continuous, and 0j(O) = 0^(0) = (at the beginning all the 
processes are idle). 

2. We assume cij to have a partially-continuous derivative a[, and this derivative should 
lie between and 1. This requirement follows from the definition of intensity and 
the fact that a[ = 9f. no process can work for a negative amount of time, and no 
process can work with the intensity greater than the one allowed. Since we consider 
a framework with shared resources, and the total intensity is limited, we have an 
additional constraint: the sum of all the derivatives a[ at any time point cannot 
exceed 1. 

Thus, this optimization problem has the following boundary conditions: 

<7i(0) = 0, 0-(O) = for % = 1, . . . , n, 
< a'j < 1 for i = 1, . . . , n, 

(34) 

o< J>;<i. 

i=l 

We are looking for a set of functions {dj} that provide a solution to minimization 
problem (33) under constraints (34). 

Let g(t, oi, . . . , a n , a[, . . . , a' n ) be a function under the integral sign of (33): 

(n \ n 

( i_ c) + c ^m(i_ F . ((7 .)). (35) 
i=\ ) j=i 

A traditional method for solving problems of this type is to use the Euler-Lagrange necessary 
conditions: a set of functions a±, . . . , a n provides a weak (local) minimum to the functional 

poo 

E u (ai, . . . , 0„) = / g(t,a\, . . . ,a n ,a[, . . . ,a' n )dt 



only if 0i , . . . , a n satisfy a system of equations of the form 



4 - jA = °- (36) 



We can prove the following lemma: 



Lemma 2 The Euler-Lagrange conditions for minimization problem (33) yield two strong 
invariants: 
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1. For processes k\ and k,2 for which a kl and a k . 2 are not on the border described by (34), 
the distribution and density functions satisfy 

/fei(o-fci) _ fk 2 {vk 2 ) ^ 



1-F kl (a kl ) 1-F k2 (a k2 )' 

2. If the schedules of all the processes are not on the border described by ( 34 ), then either 
c = 1 or f k (a k ) = for each k. 

The proof of the lemma is given in Appendix A. 3. The above lemma provides necessary 
conditions for a local minimum in the inner points described by constraints (34). These 
conditions, however, are very restricting. Therefore, we look for more general conditions, 
suitable for boundary points as well 7 . 
We start with the following lemma: 

Lemma 3 If an optimal solution for minimization problem (33) under constraints (34) 
exists, then there exists an optimal solution a±,...,a n , such that at each time t all the 
resources are consumed, i.e., 



V*XX(*) = 1. (38) 



i=l 



In the case where time cost is not zero fc/ 1), the equality above is a necessary condition 
for solution optimality. 

The proof of the lemma is given in Appendix A. 4. 

Corollary 2 Under intensity control settings, as in the case of suspend-resume settings, 
minimization problem (33) has the form (6), i.e. 

E u (a 1 ,...,a n )= / JJ(1 - Fj{o-j))dt -► min . 

Lemma 3 corresponds to our intuition: if a resource is available, it should be used. 
Without loss of generality, we restrict our discussion to schedules satisfying (38), even in 
the case where time cost is zero. This leads to the following invariant: 

n 

Vt^cii(t) =t. (39) 
i=i 

Assume now that we have two F-equivalent processes A\ and A2 with density function 
fit) satisfying the normal distribution law with mean value m. Let t\ and ti be the 
cumulative time consumed by each of the processes at time t, i.e., o-\(t) = t± and &2(t) = 
The question is, which process should be active at t (or should they be active in parallel 
with partial intensities)? 



7. Note also that even if the conditions above hold, they do not necessarily provide the optimal solution. 
Moreover, problems in variation calculus do not necessarily have a minimum, since there is no analogue 
for the Weierstrass theorem for continuous functions on a closed set. 
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Without loss of generality, t\ < ti, which means that the first process is required to 
cover a larger area to succeed: 1 — F(ti) > 1 — Fits). This supports a policy that at time t 
activates the second process. This policy is further supported if A\ has a lower distribution 
density, f\{t\) < /2 2 ) , as illustrated in Figure 18(a). If, however, the first process has a 
higher density, as illustrated in Figure 18(b), it is not clear which of the two processes should 
be activated at time t. What is the optimal policy in the general case 8 ? The answer relies 




1 2 t1 3 t2 4 5 6 7 8 9 10 1 2 3 4 5 6 t1 ? t2 8 9 10 

t t 



(a) (b) 

Figure 18: (a) Process A\ (currently at t\) has lower density and larger area to cover, and therefore 
is inferior, (b) Process A\ has lower density, but smaller area to cover, and the decision 
is unclear. 



heavily on the functions that appear in (37). These functions, described by the equation 

W«) = (40) 

are known as hazard functions, and they play a very important role in the following theorem 
describing necessary conditions for optimal schedules. 

Theorem 4 Let the set of functions {o~i} be a solution of minimization problem (6) under 
constraints (34)- Let to be a point where the hazard functions of all the processes hi{ai{t)) 
are continuous, and let A k be the process active at to (cr' k (to) > 0), such that for any other 
process A% 

hi(o-i(t )) < h k (a k {to)). (41) 
Then at to process k consumes all the resources, i.e. cr' k (to) = 1. 

The proof of the theorem is given in Appendix A. 5. 

By Theorem 4 and Equation (37), intensity control may only be useful when hazard 
functions of at least two processes are equal. However, even in this case the equilibrium 
is not always stable. Assume that within some interval [£',£"] processes A{ and Aj are 
working with partial intensity, which implies hi{ai{t)) = hj(aj(t)). Assume now that both 



8. Analysis of normal distribution given in Section 6.3 shows that the optimal policy in the example above 
is to give all the resources to process A2 in both cases. 
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hi(t) and hj(t) are monotonically increasing. If at some moment t we give a priority to one 
of the processes, it will obtain a higher value of the hazard function, and will get all the 
subsequent resources. The only case of stable equilibrium is when hi{&i(t)) and hj(crj(t)) 
are monotonically decreasing functions or constants. 

The intuitive discussion above is formulated in the following theorem: 

Theorem 5 An active process will remain active and consume all resources as long as its 
hazard function is monotonically increasing. 

The proof is given in Appendix A. 6. 

This theorem imply the important corollary: 

Corollary 3 If the hazard function of one of the processes is greater than or equal to that 
of the others at t = and is monotonically increasing by t, this process should be the only 
one to be activated. 

We can conclude that the extension of the suspend-resume model to intensity control in 
many cases does not increase the power of the model and is beneficial only for monotonically 
decreasing hazard functions. If no time cost is taken into account (c = 1), however, the 
intensity control permits us to connect the two concepts: that of the model with shared 
resources and that of the model with independent agents: 

Theorem 6 If no time cost is taken into account (c = 1), the model with shared resources 
under intensity control settings is equivalent to the model with independent processes under 
suspend-resume control settings. Namely, given a suspend-resume solution for the model 
with independent processes, we may reconstruct an intensity-based solution with the same 
cost for the model with shared resources and vice versa. 

The proof of the theorem is given in Appendix A. 7. 

Theorem 4 claims that if the process with the maximal value of hk(crk(t)) is active, it 
will take all the resources. Why, then, would we not always choose the process with the 
highest value of hk{crk{t)) to be active? It turns out that such a strategy is not optimal. 
Let us consider two processes with the distribution densities shown in Figure 19(a). The 
corresponding values of the hazard functions are shown in Figure 19(b). If we were using 
the above strategy, A2 would be the only active process. Indeed, at time t = 0, ^2(02(0)) > 
hi (oi(0)), which would lead to the activation of A^. After that moment, A\ would remain 
idle and its hazard function remain 0. This strategy would result in an expected time of 2. 
If, on the other hand, we would have activated A\ only, the result would be an expected 
time of 1.5. Thus, although h\(p\(fS)) < /^(^(O)), it is better to give all the resources to 
A\ from the beginning due to its superiority in the future. 

A more elaborate example is shown in Figure 20. It corresponds to the case of two 
processes that are not F-equivalent, one of which is a linear combination of two normal 
distributions, f(t) = 0.5/^(0.6,0.2) (*) + 0.5/^(4.0,2.0) (*)> where f N (^a)(t) is the distribution 
density of normal distribution with mean value fx and standard deviation a, and the second 
process is uniformly distributed in [1.5, 2.5]. Activating A± only results in 0.5 x 0.6 + 0.5 x 
4.0 = 2.3, activating A2 only results in an expected time of 2.0, while activating A\ for time 
1.2 followed by activating A2 results in (approximately) 0.6 x 0.5 + (1.2 + 2.0) x 0.5 = 1.9. 
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(a) 



(b) 



Figure 19: The density function and the hazard function for two processes. Although hi(ai(0)) < 
^2(02(0)), it is better to give all the resources to A\. 



The best solution is, therefore, to start the execution by activating A\, and at some point t' 
transfer the control to A 2 . In this case we interrupt an active process with a greater value 
of hazard function, preferring an idle process with a zero value of hazard function (since 
h 1 (a 1 (t'))>h 2 (a 2 (t')) = 0). 





(a) 



(b) 



Figure 20: The density function and the hazard function for two processes. The best solution is to 
start with A\, and at some point interrupt it in favor of A2, although the latter has a 
zero hazard function. 



These examples show that a straightforward use of hazard functions for building optimal 
schedules can be very problematic. However, since the susp end-resume model is a specific 
case of the intensity control model, the hazard functions still may be useful for understanding 
the behavior of optimal schedules, and this is used in the next section. 
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6. Optimal Scheduling for Standard Distributions 

In this section we present the results of the optimal scheduling strategy for a system of 
processes whose performance profiles meet one of the well-known distributions: uniform, 
exponential, normal and lognormal. Then we show the results for processes with bimodal 
and multimodal distribution functions. 

We have implemented three scheduling policies for two agents: 

1. Sequential strategy, which schedules the processes one after another, initiating the 
second process when the probability that the first one will find a solution becomes 
negligible. For processes that are not F-equivalent, we choose the best order of process 
invocation. 

2. Simultaneous strategy, which simulates a simultaneous execution of both processes. 

3. Optimal strategy, which is an implementation of the algorithm described in Sec- 
tion 4.2. 

In the rest of this section we compare these three strategies, when no deadline is given, and 
the processes are stopped when the probability that they can still find a solution becomes 
negligible. 

Our goal is to compare different scheduling strategies and not to analyze the behavior of 
the processes. Absolute quantitative measurements, such as average cost, are very process 
dependent, and therefore are not appropriate for scheduling strategy evaluation. We there- 
fore would like to normalize the results of the application of different scheduling methods to 
minimize the effect of the process behavior. In the case of F-equivalent processes, a good 
candidate for the normalization coefficient is the expected time of the individual process. 
For processes that are not F-equivalent, however, the decision is not straightforward, and 
therefore we use the results of the sequential strategy as the normalization factor. 

We define the relative quality q re f{S) of strategy S with respect to strategy S re f as 

«"^ s > - 1 " S (42) 

where u(S) is the average cost of strategy S. This measurement corresponds to the gain 
(maybe negative) of strategy S relative to the reference strategy. In this section we use the 
sequential strategy as our reference strategy. 

6.1 Uniform Distribution 

Assume that the goal-time distribution of the processes meets the uniform law over the 
interval [to,? 1 ], i.e., has distribution functions 



Fit) = { 

and density functions 



' if * < t , 

(t-t )/(T-t ) i£t€[to,T\, (43) 
1 if t > T 
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The density function of a process uniformly distributed in [0, 1] is shown in Figure 21(a). 
The hazard function of the uniform distribution has the form 



hit) = < 







1 - (t - t )/(T - t ) T-t 



if t < t , 
if i G [to, T], 



(45) 



which is a monotonically increasing function. By Corollary 3, only one process will be 
active, and the optimal strategy should be equivalent to the sequential strategy. If the 
processes are not F-equivalent, the problem can be solved by choosing the process with the 
minimal expected time. 

A more interesting setup involves a uniformly distributed process that is not guaranteed 
to find a solution. This case corresponds to a probability of success p that is less than 1. As 
it was claimed in Section 4.5, the corresponding distribution and density function should 
be multiplied by p. As a result, the hazard function becomes 



h(t) 







{T - t ) - p{t - t ) 



if t < t , 
if t € [t ,T]. 



(46) 



This function is still monotonically increasing by t, and the conclusions remain the same. 
The graphs for hazard functions of processes uniformly distributed in [0, 1] with probability 
of success of 0.5, 0.8 and 1 are shown in Figure 21(b). 





(a) 



(b) 



Figure 21: (a) The density function of a process, uniformly distributed in [0, 1], (b) hazard functions 
for processes uniformly distributed in [0, 1] with probability of success of 0.5, 0.8 and 1. 



6.2 Exponential Distribution 

The exponential distribution is described by the density function 

r/^ f if t < 
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and the distribution function has the form 



,0 if t < ,. D . 

F ^ = ^ l- e -* ift>0. (48) 



Substituting these expressions into (6) gives 
E 



f'OG ' b roc 

' u (a u ...,a n )= TT(1 - Fj{oj))dt = / e~Z?=i x ^dt. 
Jo fj{ Jo 



3= 

For a system with F-equivalent processes, by Lemma 3 

n n 
3=1 3=1 



and therefore 

A 



E u (<7l, ... ,<T n ) = 

Jo 



Thus, for a system with F-equivalent processes all the schedules are equivalent. This inter- 
esting fact is reflected also in the behavior of the hazard function, which is constant: 

h(t) = A. 

However, if the probability of success is smaller than 1, the hazard function becomes a 
monotonically decreasing function: 



p\e xt pX 



h(t) = 



1— p(l — e~ xt ) p+(l—p)e xt ' 



Such processes should work simultaneously (with identical intensities for F-equivalent pro- 
cesses, and with intensities maintaining the equilibrium of hazard functions otherwise), since 
each process which has been idle for a while has an advantage over its working teammate. 

Figure 22(a) shows the density function of an exponentially distributed process with 
A = 1. The graphs for the hazard functions of processes exponentially distributed with 
A = 1 and probability of success of 0.5, 0.8 and 1 are shown in Figure 22(b). 

Let us consider a somewhat more elaborate example, involving processes that are not 
F-equivalent. Assume that we have two learning systems, both with an exponential-like 
performance profile typical of such systems. We also assume that one of the systems requires 
a delay for preprocessing but works faster. Thus, we assume that the first system has a 
distribution density fi(t) = Aie~ Al *, and the second one has a density f2(t) = \2e^ x ' 2 ^ t2 \ 
such that Ai < A2 (the second is faster), and > (it also has a delay). Assume that both 
learning systems are deterministic over a given set of examples, and that they may fail to 
learn the concept with the same probability of 1 — p = 0.5. The graphs for the density and 
hazard functions of the two systems are shown in Figure 23. 

We applied the optimal scheduling algorithm of Section 4.2 for the values Ai = 3, 
A2 = 10, and £2 = 5. The optimal schedule is to activate the first system for 1.15136 time 
units, then (if it found no solution) to activate the second system for 5.77652 time units. 



107 



FlNKELSTEIN, MARKOVITCH & RlVLIN 





(a) 



(b) 



Figure 22: (a) The density function of a process, exponentially distributed with A = 1, (b) hazard 
functions for processes exponentially distributed with A = 1 and probability of success 
of 0.5, 0.8 and 1. 





(a) 



(b) 



Figure 23: (a) Density and (b) hazard functions for two exponentially distributed systems, with 
different values of A and time shift. 
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Then the first system will run for additional 3.22276 time units, and finally the second 
system will run for 0.53572 time units. If at this point no solution has been found, both 
systems have failed with a probability of 1 — 10~ 6 each. 

Figure 24(a) shows the relative quality of the simultaneous and optimal scheduling 
strategies as a function of ti for p = 0.8 (for 10000 simulated examples). For large values 
of ti the benefit of switching from the first algorithm to the second decreases, and this is 
reflected in the relative quality of the optimal strategy. The simultaneous strategy, as we 
can see, is beneficial only for relatively small values of £2- 

Figure 24(b) reflects the behavior of the strategies for a fixed value of £2 = 5.0 as a 
function of probability of success p. The simultaneous strategy is inferior, and its quality 
decreases while p increases. Indeed, when the probability of success is 1, running the second 
algorithm and the first one simultaneously will be a waste of time. On the other hand, the 
optimal strategy has a positive benefit, which means that the resulting schedules are not 
trivial. 






Oplimal 




Simultaneous 













Delay of the second system 

(a) 



0.1 0.2 0.3 0.4 0.5 0.6 

Probability of success 



(b) 



Figure 24: Learning systems: Relative quality of optimal and simultaneous scheduling strategies 
(a) as a function of ti for fixed p = 0.8, and (b) as a function of p for fixed t 2 = 5. 



6.3 Normal Distribution 

The normal distribution with mean value m and deviation a is described by the density 
function 

1 (t-m) 2 

/(t) = -_ e — (49) 

V27T(7 



and its distribution function is 

1 ft (x-ra) 2 

F(t) = -=- / e'^dx. (50) 

V27T<7 J-oo 

Since we use to = 0, we should have used a truncated normal distribution with a distribution 
density 

1 1 (t-mf 
. e 2a^ 

(1 - n) V2k<j 
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and a distribution function 



where 



A* 



A* 



— /' 

1 /-° 

\p2/ii(J J-c 



(x — m) 



2<r 2 0?X — /X 



(x— m) 



but if m is large enough, /tx may be considered to be 0. The density function of a normally 
distributed process with m = 5 and <r = 1 is shown in Figure 25(a). 

The hazard function of a normal distribution is monotonically increasing, which leads 
to the same conclusions as for a uniform distribution. However, a probability of success of 
less than 1 completely changes the behavior of the hazard function: after some point, it 
starts to decrease. The graphs for hazard functions of processes normally distributed with 
a mean value of 5, standard deviation of 1 and probabilities of success of 0.5, 0.8 and 1 are 
shown in Figure 25(b). 





(a) 



(b) 



Figure 25: (a) The density function of a normally distributed process, with m = 5 and a = 1, (b) 
hazard functions for normally distributed processes with m = 5 and u = 1, with the 
probabilities of success of 0.5, 0.8 and 1. 



As in the previous example, we now consider a case of two processes that are not F- 
equivalent, running with the same deviation a = 1 and the same probability of success 
p. The first process is assumed to have mi = 1, while the second process is started with 
some delay Am. The relative quality for 10000 simulated examples is shown in Figure 26. 
Figure 26(a) shows the relative quality as a function of Am for p = 0.8; Figure 26(b) shows 
the relative quality as a function of p for Am = 2. Unlike exponential distribution, the gain 
for this example for the optimal strategy is rather small. 

6.4 Lognormal Distribution 

The random variable X is lognormally distributed, if \nX is normally distributed. The 
density function and the distribution function with the corresponding parameters m and a 
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Simultaneous - 



y of the second process 



(a) 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 

Probability of success 



(b) 



Figure 26: Normal distribution: relative quality (a) as a function of Am for fixed p = 0.8, and (b) 
as a function of p for fixed Am = 2. 



can be written as 



1 (log(t)-m) 2 

tx/zTra 



I /-log(t) ( x -m) 2 

F(t) = / e ^dx. (52) 



27TCT 



oo 



Lognormal distribution plays a significant role in AI applications since in many cases search 
time is distributed under the lognormal law. The density function of the lognormal distri- 
bution with mean value of log(5.0) and standard deviation of 1.0 is shown in Figure 27(a), 
and the hazard functions for different values of p are shown in Figure 27(b). Let us consider 
a simulated experiment similar to its analogue for normal distribution. We consider two 
processes that are not F-equivalent, with the parameters a = 1 and the same probability of 
success p. The first process is assumed to have m\ = 1, while the second process is started 
with some delay, such that mi — m\ = Am > 0. The relative quality for 10000 simulated 
examples is shown in Figure 28. Figure 28(a) shows the relative quality as a function of 
Am for p = 0.8; Figure 28(b) shows the relative quality as a function of p for Am = 2. The 
graphs show that for small values of Am both the optimal and the simultaneous strategy 
have a significant benefit over the sequential one. However, for larger values, the perfor- 
mance of the optimal strategy approaches the performance of the sequential strategy, while 
the simultaneous strategy becomes inferior. 

6.5 Bimodal and Multimodal Density Functions 

Experiments show that in the case of F-equivalent processes with a unimodal distribution 
function, the sequential strategy is often optimal. In this section we consider less trivial 
distributions. 
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(a) (b) 

Figure 27: (a) Density function for lognormal distribution with mean value of log(5.0) and standard 
deviation of 1.0 and (b) hazard functions for lognormally distributed processes with 
mean value of log(5.0), standard deviation of 1, and the probabilities of success of 0.5, 
0.8 and 1. 




Figure 28: Lognormal distribution: relative quality (a) as a function of Am for fixed p = 0.8, and 
(b) as a function of p for fixed Am = 2. 
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Assume first that we have a non-deterministic algorithm with a performance profile 
expressed by a linear combination of two normal distributions with the same deviation: 

f(t) = 0.5/^,0-) + 0.5/jv( M2j<t )- 

An example of the density and hazard functions of such distributions with fj,± = 2, hi = 5, 
and a = 0.5 is given in Figure 29. 




(a) (b) 

Figure 29: (a) Density function and (b) hazard function for a process distributed according to 
the density function /(f) = 0.5/jv(2,o.5) + 0-5/iv(5,o.5) with the probability of success of 
p = 0.8. 



Assume that we invoke two runs of this algorithm with fixed values of /xi = 2, a = 0.5, 
and p = 0.8, and the free variable fi2- Figure 30 shows how the relative quality of the 
scheduling strategies is influenced by the distance between the peaks, [j,2 — A*i- The results 
correspond to the intuitive claim that the larger distance between the peaks, the more 
attractive the optimal and the simultaneous strategies become. 




-15 - ' 
-20 - 

-25 I ^ ' ' ' ' ' ' 
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Distance between the peaks 



Figure 30: Bimodal distribution: relative quality as a function of the distance between the peaks. 
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Now let us see how the number of peaks of the density function affects the scheduling 
quality. We consider a case of partial uniform distribution, where the density is distributed 
over k identical peaks of length 1 placed symmetrically in the time interval from to 100. 
(Thus, the density function will be equal to 1/k when t belongs to one of such peaks, and 
otherwise.) In this experiment we have chosen p = 1. 

Figure 31 shows the relative quality of the system as a function of k, obtained for 
10000 randomly generated examples. We can see from the results, that the simultaneous 
strategy is inferior, due to the "valleys" in the distribution function. The optimal strategy 
returns schedules where the processes switch after each peak, but the relative quality of the 
schedules decreases as the number of peaks increases. 

50 
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Figure 31: Multimodal distribution: relative quality as a function of the number of peaks. 
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7. Experiments: Using Optimal Scheduling for the Latin Square Problem 

To test the performance of our algorithm in a realistic domain, we applied it to the Latin 
Square problem described in Section 2.2. We assume that we are given a Latin Square 
problem with two initial configurations, and a fully deterministic algorithm with distribution 
function and distribution density shown in Figure 7. 

We compare the performance of the schedule produced by our algorithm to the perfor- 
mance of the sequential and simultaneous strategies described in Section 6. In addition, 
we test a schedule which runs the processes one after another, allowing a single switch at 
the optimal point (an analogue of the restart technique for two processes). We refer to this 
schedule as a single-point restart schedule. 

Note that the case of two initial configurations corresponds to the case of two processes 
in our framework. In general, we could think of a set of n initial configurations that would 
correspond to n processes. For sufficiently large n, the restart strategy where each restart 
starts with a different initial configuration, becomes close to optimal. 

Our experiments were performed for different values of N, with 10% of the square pre- 
colored. The performance profile was induced based on a run of 50, 000 instances, and the 
remaining 50, 000 instances were used as 25, 000 testing pairs. All the schedules were applied 
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with a fixed deadline T, which corresponds to the maximal allowed number of generated 
nodes. 

Since the results of the sequential strategy in this type of problems are much worse 
than the results of other strategies for sufficiently large values of T, we instead used the 
simultaneous strategy as the reference in the relative quality measure. 




-5 - 

-10 - ' ' 

-15 I ' ' ' ' ' ' ' ' ' 

5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 

Maximal available time 

Figure 32: Relative quality as a function of maximal allowed time T 



Figure 32 shows how maximal available time T (the x axis) influences the quality of the 
schedules (the y axis), where the simultaneous strategy has been used as a reference. 

For small values of T, both single-point restart and the optimal strategy have about a 
25% gain over the simultaneous strategy, since they produce schedules which are close to 
the sequential one. However, when available time T increases, the benefit of parallelization 
becomes more significant, and the simultaneous strategy overcomes the single-point restart 
strategy. The relative quality of the optimal schedule also decreases when T increases, since 
the resulting schedule contains more switches between the two problem instances being 
solved. 

Figure 33 illustrates how the optimal and single-point restart schedules relate to the 
simultaneous schedule for different size Latin Squares (given T = 25,000). The initial gain 
of both strategies is about 50%. However, for the problems with N = 20 the single-point 
restart strategy becomes worse than the simultaneous one. For larger sizes the probability 
of solving the Latin Square problem with a time limit of 25, 000 steps becomes smaller and 
smaller, and the benefit of the optimal strategy also approaches zero. 
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Figure 33: Relative quality as a function of the size of the Latin Square 



8. Combining Restart and Scheduling Policies 

Luby, Sinclair, and Zuckerman (1993) showed that the restart strategy is optimal if an 
infinite number of identical runs are available. When this number is limited, the restart 
strategy is not optimal. Sometimes, however, we have a mixed situation. Assume that we 
have two initial states, a non- deterministic algorithm, and a linear time cost. On one hand, 
we can perform restarts of a run corresponding to one of the initial states. On the other 
hand, we can switch between the runs corresponding to the two initial states. What would 
be an optimal policy in this case? 

The expected time of a run based on a single initial state is 

E{n = W)f ( 1 ~ F W dt > (53) 

where t* is the restart point and F(t) is the distribution function. This formula is obtained 
by a simple summation of the geometric series with coefficient 1 — F(t*), and is a continuous 
form of the formula given by Luby, Sinclair, and Zuckerman (1993). Minimization of (53) 
by t* gives us the optimal restart point. 

Assume first that the sequence of restarts on a single initial state is a process inter- 
ruptible only at the restart points. Since the probability of failure of % successive restarts 
is (1 — F(t*)Y, this process is exponentially distributed. Thus, the problem is reduced to 
scheduling of two exponentially distributed processes. According to the analysis in Sec- 
tion 6.2, all schedules are equivalent if the problems corresponding to the two initial states 
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are solvable. Otherwise, the optimal policy is to alternate between the two processes at 
each restart point. 

A more interesting case is when we allow rescheduling at any time point. In general, 
it is not beneficial to switch between the processes in non-restart points (otherwise these 
rescheduling points would have been chosen for restart). Such rescheduling, however, can 
be beneficial if the cost associated with restarts is higher than the rescheduling cost 9 . 

Let us assume that each restart has a constant cost C. Similarly to (53), we can write 
the expected cost of a policy performing restarts at point t** as 



Let t** and t* be the optimal restart points for the setups with and without associated costs 
respectively, t** should be greater than t* due to the restart cost. 

Let us consider the following schedule: the first process runs for t*, then the second 
process runs for t*, then the first process runs (with no restart) for additional t** — t*, then 
the second process runs for additional t** — t*. Then the first process restarts and runs for 
t* and so forth. 

Let us compare the expected time of such schedule with the time of the pure restart 
policy, where the first process runs for t**, then the second process runs for t**, then the 
first process restarts and runs for t** and so forth. 

Similarly to (15), the expected time of the first schedule in the interval [0, 2t**] can be 
written as 



On the other hand, the expected time of the second schedule in the same interval is 



9. An example for such setup is robotic search, where returning the robot to the initial position is more 
expensive than suspending and resuming the robot. 




(54) 



where the second term corresponds to the series 



+ C(l - F(t**)) + 2C(1 - F(t**)) 2 + ... 





117 



FlNKELSTEIN, MARKOVITCH & RlVLIN 



E sc hed can be rewritten as 



F {f*) / (i _ F{t))dt + (2 - F(t*) - F(t**)) / (1-F(t))dt 



E sched =/*(!- F(t))dt + (1 - F(t*)) f (1 - F(t))dt+ 



(1 - F(t**)) f (1 - F(t))eft - (1 - F(t**)) f (1 - F{t))dt = 





F(t**) f (1-F(t))dt-F(f) f {l-F{t))dt + E simple . 




Thus, we obtain 




pt** i-t* 
F(t*) / (I- F(t))dt- F(t**) / (1-F(t))dt 



and since t* provides minimum for (53), the last expression is positive, which means that 
scheduling improves a simple restart policy. 

Note, that we do not claim that the proposed scheduling policy is optimal - our example 
just shows that the pure restart strategy is not optimal. There should be an optimal 
combination interleaving restarts on the global level and scheduling on the local level, but 
finding this combination is left for future research. 

9. Conclusions 

In this work we present an algorithm for optimal scheduling of anytime algorithms with 
shared resources. We first introduce a formal framework for representing and analyzing 
scheduling strategies. We begin by analyzing the case where the only allowed schedul- 
ing operations are suspending and resuming processes. We prove necessary conditions for 
schedule optimality and present an algorithm for building optimal schedules that is based on 
those conditions. We then analyze the more general case where the scheduler can increase or 
decrease the intensity of the scheduled processes. We prove necessary conditions and show 
that intensity control is only rarely needed. We then analyze, theoretically and empirically, 
the behavior of our scheduling algorithm for various distribution types. Finally, we present 
empirical results of applying our scheduling algorithm to the Latin Square problem. 

The results show that the optimal strategy indeed outperforms other scheduling strate- 
gies. For lognormal distribution, we showed an improvement of more than 50% over the 
naive sequential strategy. In general, our algorithm is particularly beneficial for heavy-tailed 
distributions, but even for exponential distribution we show a benefit of more than 35%. 

In some cases, however, simple scheduling strategies yield results similar to those ob- 
tained by our algorithm. For example, the optimal schedule for uniform distribution is to 
apply one of the processes with no switch. When the probability to succeed within the given 
time limit approaches 1, this simple scheduling strategy also becomes close to optimal, at 
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least for unimodal distributions with no strong skew towards zero. On the other hand, 
when the probability of success approaches zero, another simple strategy that applies the 
processes simultaneously becomes close to optimal. 

Such a behavior meets the intuition. For heavy-tailed distributions, switching between 
the runs is promising because the chance to be on a bad trajectory is high enough. The 
same is correct for distributions with low probability of success. However, if the probability 
to be on a bad trajectory is too high, the best strategy is to switch between the runs as 
fast as possible, which is equivalent to the simultaneous strategy. On the other hand, if the 
distribution is too skewed to the right, often there is no sense to switch between the runs, 
since the new run should pay a high penalty before it reaches the "promising" distribution 
area. In general, when the user is certain that the particular application falls under one of 
the categories above, the cost of calculating the optimal schedule can be saved. 

The high complexity of computation is one of the potential weaknesses of the presented 
algorithm. This complexity can be represented as a multiplication of three factors: function 
minimization, Branch-and-Bound search, and solving Equations (18) and (19) for the case 
of two agents or Equation (28) for the general case. For two agents, the only exponen- 
tial component is the Branch-and-Bound search. We found, however, that in practice the 
branching factor, which is roughly the number of roots of the equations above, is rather 
small, while the depth of the search tree can be controlled by iterative-deepening strategies. 
For an arbitrary number of agents, function minimization may also be exponential. In prac- 
tice, however, it depends on the behavior of the minimized function and the minimization 
algorithm. 

Since the optimal schedule is static and can be applied to a large number of problem 
instances, its computation is beneficial even when associated with high cost. Moreover, in 
some applications (such as robotic search) the computational cost can be outweighed by 
the gain obtained from a single invocation. 

The previous work most related to our research is the restart framework (Luby et al., 
1993). The most important difference between our algorithm and the restart policy is the 
ability to handle the cases where the number of runs is limited, or where different algorithms 
are involved. When only one algorithm is available and the number of runs is infinite, the 
restart strategy is optimal. However, as we have shown in Section 8, some problems may 
benefit from the combination of these two approaches. 

Our algorithm assumes the availability of the performance profiles of the processes. Such 
performance profiles can be derived analytically using theoretical models of the processes 
or empirically from previous experience with solving similar problems. Online learning of 
performance profiles, which could expand the applicability of the proposed framework, is a 
subject of ongoing research. 

The framework presented here can be used for a wide range of applications. In the intro- 
duction we presented three examples. The first example describes two alternative learning 
algorithms working in parallel. The behavior of such algorithms is usually exponential, and 
the analysis for such setup is given in Section 6.2. The second example is a CSP problem 
with two alternative initial configurations, which is analogous to the Latin Square example 
of Sections 2.2 and 7. The last example includes crawling processes with a limited shared 
bandwidth. Unlike the first two examples, this setup falls under the framework of intensity 
control described in Section 5. 
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Similar schemes may be applied for more elaborate setups: 

• Scheduling a system of n anytime algorithms, where the overall cost of the system is 
defined as the maximal cost of its components (unlike the analysis in Section 4, this 
function is not differentiable) ; 

• Scheduling with non-zero process switch costs; 

• Providing dynamic scheduling algorithms able to handle changes in the environment; 

• Building effective algorithms for the case of several resources of different types, e.g., 
multiprocessor systems. 

Appendix A. Formal Proofs 
A.l Proof of Lemma 1 

The claim of the lemma is as follows: 

For a system of n processes, the expression for the expected cost (6) can be rewritten as 

oo n i+n—1 „q 

£? tt (Ci,...,Cn,...) = EE II (i-^iKLi)) L (1-Fi(x))dx. (55) 

k=0 i=l j=i+l J ^k-i 

Proof: Splitting the whole integration range [0, oo) to the intervals [t^T 1 ,^] yields 
the following expression: 

E u (a 1 ,...,a n )= / n^-w^ELn^-w- ( 56 ) 

J ° 3=1 k=0 i=l Jt k j=l 

By (25), we can rewrite the inner integral as 

„fi n 



ft}. 

Lri( i -^^))= 

™ ti_ „- 1 

n 

dt= (57) 



k j=l 
i-1 



k 

i-1 



H(i - Fj((i)) ■ (i - Fi(t - 4- 1 + a_i)) II (! - F M-i)) 

3=1 j=i+l 



n (i - F #j (d)) - Fi(t - 4- 1 + e k -i))dt. 

j=i+l-n *fc 

Substituting x for t — t^T 1 + Cl-i an d using (23), we obtain 

n (i - F #j (d)) tji - F t (t - 4- 1 + ci-i))dt 

1 1 Jit. 



j=i+l-n 

i+n—1 „+i —t i ~ 1 -i-/"' 

fl k l k +^k-l 



II (1 " F#M-i)) / (1 - KM)** = (58) 



j=i+l J ik-i 
i+n—1 „(i 



n (i-^-(cLi)) r (i-^))^. 
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Combining (56), (57) and (58) gives us (55). 
Q.E.D. 



A. 2 Proof of the Chain Theorem for n Processes 

The chain theorem claim is as follows: 

The value for C^+i may either be Cm 1 ' or can be computed given the previous 2n — 2 
values of Q using the formula 

l+n-l l+n-l 

n (i-^(d-i))- n 



]T H (l-F #j (C)) / (1-F #i (x))dx 

i=l-n+l j=i+l 
#3+1 

Proof: By Lemma 1, the expression we want to minimize is described by the equation 

oo n i-\-n— 1 

S tt (Ci,...,Cn,...) = EE II (l-^ld)) /* (l-*i(s))<*r. (60) 
fc=0 i=l j=i+l 

The expression above reaches its optimal values either when 

dE u 



Ofor j = l,...,n, . .., (61) 



or on the border described by (26). 

Reaching the optimal values on the border corresponds to the first alternative described 
in the theorem. Let us now consider a case when the derivative of E u by Cj is 0. 

Each variable Cj may be presented as Cmn+i = Cm> wh ere < Z < n — 1. Let us see which 
summation terms of (60) Cm is participating in. 

1. Cm ma Y be a lower bound of the integral from (60). This happens when k = m + 1 
and i = I. The corresponding term is 



l+n-l .£l +1 

S = (l-F #j (C m )) / m (l-FKx))^, 
j=/+i 



and 

,„ i+n— 1 

^L=- { i- Fl (0). n (i-f # ,(o. 

2. Cm m ay be an upper bound of the same integral, which happens when k = m and 
i = I. The corresponding term is 

l+n-l ,.t l 

^ — r / m 

Si= H (l-F #j {C-i)) I (l-F^dx, 

j=l+l J Cm-l 
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and 

,„ l+n— 1 

«.m j=l+1 

3. Finally, ( l m may participate in the product 

i+n—l 

n (i--p#i(ci_i)). 
j=i+i 

For i = 1 . . . I — 1, this may happen when A; = m + 1 and j = I, and the corresponding 
term is 

Si= n (i-^-(a t )) / m (l-^^jdx, 

■ II JO 

J=l + 1 ^"i 

with the derivative 

jo i+n—l .£» 

= -^(&) II (! - / m (i - 

For i = / + 1 . . . n, k = m and j = 1 + n. The corresponding term is 

i+n—l 

-■ — r / Sm 

Si= n (i-^-(c-i)) / j (l-F^^dx, 

j=i+l "'Cl-l 

with the derivative 

-d- = -//(d) n (i - ^-(c-i)) / (i - FiWdx. 

Since for i = Z, (, l m appears only in the integral, there is no other possibility for ( l m to appear 
in the expression, and therefore 

dE u _ dSi 

The right-hand side of the sum above can be written as follows: 
\ ^ dSi 

l+n-l l+n-1 

-a-^(d)) n (i-^(c4))+(i-^(0) n a-^(c-i))- 

j=l+l j=l+l 
l—l i+n—l .si 

II C 1 ^{l-F^dx- 

i=l j=i+l,#j^l J Om 

n i+n—l 

i=l + l j=i+l,#j^l J Cm-l 



i+n—l 

{x))dx. (62) 
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However, 



n i+n—1 n ri 

* -v -■ — r I Sm 

E II (i i (i-F t (x))dx = 

i=i+ij=i+i,#j^i J c m - 1 

i+n-1 .£» +l 

E II (l-F#j((L)) r (l-F t (x))dx. (63) 

i=l-n+lj=i+l,#j^l 

Substituting (63) into (62), we obtain 
\ ^ dSi 

(l+n-1 l+n-1 \ 

n a-^'(c-i))- n o-- F #i(&)))- 
j=l+l j=l+l J 

l—l i+n—1 „£i 

e n (i-^-(c)) / m+1 (i-^w)^. (64) 

i=l-n+l j=i+l,#j^l J( >™ 

If 1 — -Fi(Cm) were 0, that would mean that the goal has been reached with the probability 
of 1, and further scheduling would be redundant. Otherwise, expression in (64) is when 

l+n-1 l+n-1 

n t i - F #Mi-i))- n a -*#*(&)) 

//(Cm) J='+l J'='+l 



e n (1-^(0 / (i-f # ,(x))^ 

i=l-n+l j=i+l,#j^l 

which is equivalent to (59). 

Equation (59) includes 2n - 1 variables (C^i = ( n (m-i)+i+i to C„+i = ( n (m+i)+i-i), 
providing an implicit dependency of Cm+i on the remaining 2n — 2 variables. 
Q.£.L>. 



A. 3 Proof of Lemma 2 

The claim of the lemma is as follows: 

The Euler- Lagrange conditions for the minimization problem (33) yield two strong in- 
variants: 

1. For processes k\ and k<i for which a kl and a k2 are not on the border described by (34), 
the distribution and density functions satisfy 

/fci(°fci) = fk 2 { (J k 2 ) / g5 N 

1-F kl (a kl ) l-F k2 {a k2 Y 1 ' 
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2. If the schedules of all the processes are not on the border described by ( 34 ), then either 
c = 1 or f k (a k ) = for each k. 

Proof: Let g(t, a±, . . . , a n , a[,. . . , a' n ) be the function under the integral sign of (33): 

(n \ n 

(l-c)+c^cr-J Y[(l- Fjio-j)). (66) 
i=i J j=i 

A necessary condition of Euler-Lagrange claims that a set of functions u\ , . . . , a n provides 
a weak (local) minimum to the functional 

poo 

E u (ai, . . . , a n ) = / g(t,a\, . . . ,a n ,a[, . . . ,a' n )dt 
Jo 

only if these functions satisfy a system of equations of the form 

S k-jA = 0. (07) 



In our case, 



and 



4 = - (1 - c) + c J>M fk(o- k ) - (68) 

V i=i J j^k 



jA = 4 f[(l - Fj(aj)) = -cj^a'JM ]J(1 - Fj(aj)). (69) 
j=i i=i jjti 

Substituting the last expression into (67), we obtain 

n 

g' ai =g' a2 =...= g' un = -C^Ml) W 1 ~ F iW)> 

1=1 j+l 

and by (68) for every k\ and A?2 

a.k) n t 1 - f m)) = n (! - ^(^))- 

j^fcl j+k 2 

We can ignore the case where one of the terms 1 — Fj(*jj) is 0. Indeed, this is possible only 
if the goal is reached by process j with probability of 1, and in this case no optimization is 
needed. Therefore, we obtain 

/fciKi)(l - F k2 (a k2 )) = f k2 {a k2 )(l - F kl (a kl )), (70) 

which is equivalent to (65). 
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Let us show now the correctness of the second invariant. By (69) and (65), we obtain 
7 n 

1=1 j+l 
1=1 lK L > 1 = 1 



C 



E4i^na-^)> = 



\i=l / jVfe 



By (36) we get 



<4 - = - (a - c) + c f>; ) / fc K) n(i - %)) 
+c(E^)/ifcK)II( 1 -^(^)) = 

\j=l / j+k 
-(l-c)f k (a k )H(l-F j (a j )) = 0. 

Since we ignore the case when (1 — Fj(aj)) = 0, the second invariant is correct. 
Q.E.D. 



A. 4 Proof of Lemma 3 

The claim of the lemma is as follows: 

// an optimal solution exists, then there exists an optimal solution oi, . . . ,a n , such that 
at each time t all the resources are consumed, i.e., 



Vt X>^) = 1. 



i=i 



In the case where time cost is not zero (c/ 1), the equality above is a necessary condition 
for solution optimality. 

Proof: We know that {uj} provide a minimum for the expression (33) 



poo ( n \ n 

j o ^(i-c)+c^,;jn(i-%p. 



Let us assume that in some time interval [to, ti], {o~i} do not satisfy the lemma's constraints. 
However, it is possible to use the same amount of resources more effectively. Let us consider 
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a linear time warp v(t) = at + (3 on the time interval [£o>*i]> satisfying v(to) = to- From 
the last condition, it follows that (3 = to(l — a). Let t' x be a point where v{t) achieves t\, 
i.e., t[ = to + (ti — to) I a. Let us consider a set of new objective schedule functions <7j(i) of 
the form 

r ^(t), t<t , 

5;(t) = I <Ti(at + 0), t <t< t[, 
[ ai{t + h -t[), t > t[. 

Thus, <Ji(t) behaves as Oi{t) before to, as <Ji(t) with a time shift after t[, and as a linearly 
speeded up version of crj(i) in the interval [to,^]. Since v(to) = to and v{t']) = t\, <7j(t) is 
continuous at the points to an d t^. 

cr'i(t) is equal to ao^(t) within the interval [to,ti], and to <j[{t) outside this interval. By 
the contradiction assumption, Oi do not meet the lemma constraints in [io^iL an d thus we 
can take 

leading to valid functions S^(t). Using <7j(t) in (33), we obtain 



E u (ai, . . . ,a n ) = 



roo ( n \ n 

/ (1 - c) + c£cfft) JJ(1 - = 

Jo V i=i / j=i 

/•to / n \ n 

J o ^(1 " C) + *&)J " W*)))* + 

ft'l ( U \ n 

/ (l-c)+caJ2°i( at + P) Y[(l- F j (a j (at + (3)))dt + 
Jt ° \ i=l J j=l 

roo / n \ n 

J I (i - C ) + c J2°Kt + h- 1[) J - + *i - *!)))*. 



i=l / i=l 



By substituting x = at + in the second term of the last sum, and x = t + 1\ — in the 
third term, we obtain 



E u (ai, . . . ,<j n ) 



/•to / n \ n 

/ (1 - c) + c£ o#) " + 

Jo V i=i / i=i 

ftl ( 1 _ n \ « 



i=i / j=i 



-oo / n \ n 

/ (1 - c) + crofts) - = 

Jt i V *=i / i=i 

. . . ,<r„) - (1 - c) (l - f[(l - Fj(aj))dt. 



Since a > 1, the last term is non- negative, and therefore 

E u (ai, ... ,a n ) < E u (ai, ...,a n ) 
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meaning that the set {cjj} provides a solution of at least the same quality as {<7j}. If c / 1, 
this contradicts to the optimality of the original schedule, and if c = 1, the new schedule 
will also be optimal. 
Q.E.D. 



A. 5 Proof of Theorem 4 

The claim of the theorem is as follows: 

Let the set of functions {dj} be a solution of minimization problem (6) under con- 
straints (34)- Let to be a point where the hazard functions of all the processes hi(ai(t)) are 
continuous, and let A k be the process active at to (o~' k (to) > 0), such that for any other 
process Ai 

hi(ai(t )) < h k (a k (t )). (71) 

Then at to process k consumes all the resources, i.e. o-' k (to) = 1. 

Proof: First we want to prove the theorem for the case of two processes, and then 
to generalize the proof to the case of n processes. Assume that a\(t) and a 2 (t) provide the 
optimal solution, and at some point to °~'i(to) > and 

/iM*o)) > / 2 M*o)) (72) 



1-Fi(<7i(t )) 1-F 2 (a 2 (t )) 

From the continuity of the functions hi(t) in the point to, it follows that there exists some 
neighborhood U(to) of to, such that for each two points t',t" in this neighborhood h\(t') > 
h 2 (t"), i.e., 

mm — — — — > max — — t-^-. (73) 

t'eu(t ) 1-Fi(<7i(f)) t"eU(t )l-F 2 (a 2 (t")) v ; 

Let us consider some interval [to,h] C U(to). In order to make the proof more readable, we 
introduce the following notation (for this proof only): 

• We denote 01 (i) by a(t). By Lemma 3, cr 2 (t) = t — a(t). 

• We denote a (to) by a and u(ti) by a 1 . 

In the interval [io^i] the first process obtains a 1 — a resources, and the second process 
obtains (t±— to) — (o~ l — a ) resources. Let us consider a special resource distribution a, which 
first gives all the resources to the first process, and then to the second process, keeping the 
same quantity of resources as a: 



a(t) 



a(t), t < to, 

t-to + a , t < t < t + a 1 - a , 

a 1 , t + a l -a <t <h 

a(t), t > h. 



It is easy to see that a(t) is continuous at the points to, t±, and to + a 1 — a . We want to 
show that, unless the first process consumes all the resources at the beginning, the schedule 
produced by a outperforms the schedule produced by a. 
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Let t* = to + a — a , which corresponds to the time when the first process would have 
consumed all its resources had it been working with the maximal intensity. First, we want 
to show that in the interval [io>^*] 

(1 - - F 2 (t - a(t))) > (1 - F x {t - t + a°))(l - F 2 (t - a )). (74) 

Let 

v{t) = (t - t + a ) - a(t). (75) 

The inequality (74) becomes 

(l-F 1 (t-to + a -u(t)))(l-F 2 (to-a° + u(t))) > (I- F 1 {t-t + a°)){l- F 2 {t -a )). (76) 

Let us find a value of x = u(t) that provides the minimum to the left-hand side of (76) for 
the fixed t. Let us denote 

G(x) = (l-F 1 (t-t + a°- x))(l - F 2 (t - a + x)). 

Then, 

G'[x) = fi{t - to + a - x))(l - F 2 (t + x)) - f 2 (t -a° + x)(l- F^t -to + a°- x)). 

Since a valid a(t) in the interval [to^i] obtains values between cr° and a 1 , by (75) we have 

t-t + a° -x€ [a , a 1 }, 

t -a + x £ [t - a°,h - a 1 }. 

Therefore, there exist t', t" E [to, ti], such that ai(t') = a(t') = t — to + <r° — x and <J 2 (t") = 
t" — (J {t") = to — (J° + x. By (73) we obtain G'{x) > 0, meaning that G{x) monotonically 
increases. Besides, by (75) we have x = v(t) > (since a'{t) < 1), and therefore G(x) 
obtains its minimal value when x = 0. Therefore, if we denote by Ran{t) the set of valid 
values for v{t), 

(1 - Fi(<t))(1 - F 2 {t - a)) = (1 - -t + a°- v(t)))(l - F 2 (t - a + u(t))) > 
min (1 - Fi(t - to + a° - x))(l - F 2 (t - a + x)) = 

xdRan(t) 

(1 -F 1 (t-t + <t°))(1 - F 2 (t - a )), 
and the strict equality occurs if and only if a(t) = t — to + <J°. Thus, 

(1 - F 1 (a)){l - F 2 {t - a)) > (1 - F 1 {a)){l - F 2 {t - a)) 

for t G [to,t*]. 

Let us show now the correctness of the same statement in the interval which is 

equivalent to the inequality 

(1 - F 1 (a(t)))(l - F 2 (t - a(t))) > (1 - F 1 (a 1 ))(l - F 2 (t - a 1 )). (77) 

The proof is similar. Let 

u(t) = a 1 - o{t). (78) 
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The inequality (77) becomes 

(1 _ Fl(fT i _ I/(t )))(i _ F 2 (t - a x + !/(*))) > (1 - - F 2 (t - a 1 )). (79) 

As before, we find a value of x = u{t) that provides the minimum to the left-hand side 
of (79) 

G[x) = (1 - F^a 1 - x))(l - F 2 (t - a 1 + x)). 
The derivative of G(x) is 

G"(x) = h{a l - x))(l - F 2 (t - a 1 + s)) - / 2 (t - a 1 + x)(l - Fx (a 1 - z)), 

and since a valid cr(i) in the interval [to,t\] obtains values between cr° and cr 1 , by (78) we 
have 

t-cr 1 +x G [t -cr°,ti - cr 1 ]. 

Therefore, there exist t', t" G [to,ii], such that ai(i') = u(t') = a 1 — x and <J2{t") = 
t" — <j{t") = t — a 1 + x. By (73), G'(x) > 0, and therefore G(x) monotonically increases. 
Since x = a 1 - a(t) > 0, G(x) > G(0). Thus, for 4 G [t*,ti], 

(1 - F 1 {a))(l - F 2 {t - a)) = (1 - F^a 1 - u{t))){l - F 2 {t - a 1 + u{t))) > 
min (1 - F^a 1 - x))(l - F 2 (t - a 1 + x)) = (1 - Fi(a 1 ))(l - F 2 (t - a 1 )), 

xdRan{t) 

and the strict equality occurs if and only if a(t) = a 1 . 

Combining this result with the previous one, we obtain that 

(1 - Fi(<t))(1 - F 2 {t - a)) > (1 - Fi(o?))(l - F 2 {t - a)) 

holds for every t G [to,ti]. Since a(t) behaves as a(t) outside this interval, E u (a) > E u (a). 
Besides, since the equality is obtained if and only if a = a, and since E u (a) is optimal, we 
obtain that a = a, and therefore the first process will take all the resources in some interval 

Mi]- 

The proof for n processes is exactly the same. Let {a{\ provide the optimal solution, 
and at the point to there is process k, such that for each j / k 

hk{<Tk{to)) > hj(aj(t )). 

From the continuity of the functions hi(<Ti(t)) in the point to, it follows that there exists 
some neighborhood U(to) of to, such that 

min hkUfkit')) > max max hAaAt")). (80) 
t'£U(t ) i^k t"eu {t ) 

Let us take any process I ^ k, and let 

y(t) = a k (t)+a l (t). 
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Now we can repeat the above proof while substituting y(t) instead of t under the function 
sign: 

y{t) < y(to), 

y{t) - y{to) + cr k (t ), y(t ) < y(t) < y(t ) + (T k (ti) - a k (t ), 
<7k(ti), y(to) + o-fc(ti) - ak(to) < y{t) < y{h), 

°k(t), y(t) > y(h). 

The substitution above produces a valid schedule due to the monotonicity of y(t). The rest 

of the proof remains unchanged. 

Q.E.D. 



A. 6 Proof of Theorem 5 

The claim of the theorem is as follows: 

An active process will remain active and consume all resources as long as its hazard 
function is monotonically increasing. 

Proof: The proof is by contradiction. Let {aj} form an optimal schedule. Assume 
that at some point t\ process A k is suspended, while its hazard function h k {a k {t\)) is 
monotonically increasing at t±. 

Let us assume first that at some point t 2 process A k becomes active again. Since we do 
not consider the case of making process active at a single point, there exists some A > 0, 
such that A k is active in the intervals \t\ — A,ti] and [£2,^2 + A]. A k has been stopped 
at a point of monotonicity of the hazard function, and therefore, by Theorem 4, in these 
intervals A k is the only active process. We consider two alternative scenarios. In the first 
one, we allow A k to be active for additional A time starting at t\ (i.e., shifting its idle 
period by A), while in the second we suspend A k by A earlier. 

For the first scenario, the scheduling functions have the following form: 



It is possible to see that these scheduling functions are continuous and satisfy invariant (39), 
which makes this set a suitable candidate for optimality. 
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<Tfc(t) = < 



o- fc (t), t < h, 

o-fe(ii) + (* - h<t<h + A, 

a k (t- A) + A = a k (h) + A, t\ + A < t < t 2 + A, 

a k (t), t>t 2 + A; 

<Tj(i), t < ti, 

o-j(h), h<t<h + A, 

<7j(t-A), t 1 + A<t<t 2 + A, 
o-Jt), t>t 2 + A. 



(81) 



(82) 
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Substituting these values of a into (6), we obtain 
cti n 



E u (a a u . . . , a a n ) = [ 1 [[(1 - F j (a j (t)))dt+ 
I 1+ (1 - F k (a k (h) + (t - h))) [[(1 - F J {a 3 {t 1 )))dt+ 

Jtl 3+k 
rt2+A poo n 

/ (1 - F fc (a fc (ti) + A)) [[(1 - F {a {t - A)))dt + / [J(l - F^mdt = 
Jt 1+ A f£ Jt 2+ Af = \ 

rti n pA 

/ Uil-FjiajiWdt + Tlil-Fjiajih))) (l-F k (a k (h) + x))dx+ 

J ° 3=1 3+k ^ 

rt2 POO n 

/ (1 - F k (a k (h) + A)) - F^mdt + / - Fj(aj(t)))dt- 



'h ^ k Jt 2 +A j=1 

Subtracting E u (o\, . . . , a n ) given by (6) from E u (af, . . . , cr£), we get 

£ u (oi, . . . ,cr n ) - E u (af, ...,a%) = 

A(l - F k (a k (t))) - (1 - F fc (<7 fc (ti) + A))] [[(1 - 

j^* (83) 



/•* 2 +A n /-A 

/ [[(l-^^WP-na-fi^lti))) / (l-F k (a k (h)+x))da 

Jt * 3 = 1 &k J ° 



Let us consider the first term of the last equation. Since in the interval [t±, t?\ cr k (t) = a k (ti), 
in this interval 

(1 - F k (a k (t))) - (1 - F k (a k (h) + A)) = (1 - F k (a k (h))) - (1 - F k (a k (h) + A)) = 

pA pA i-A 

- d(l - F k (a k (h) + x)) = fkMh)+x)dx= hkiakih) + x)(l - Fkiakih) + x))dx. 
Jo Jo Jo 

Due to monotonicity of h k {a k ) in ti, 

(1 - F k (a k (t))) - (1 - F k (a k (h) + A)) = 
i-A p-A 

\ h k (a k (ti) + x)(l - F k (a k (ti) + x))dx > h k {a k {t{)) \ (1 - F k (a k (ti) + x))dx, 
Jo Jo 

which leads to 

A(l - F k (a k (t))) - (1 - F k (a k (h) + A))] [[(1 " F 3 (a 3 (t)))dt > 
Jtl 3^k 

h k (a k (h)) [ (1 - F k (a k (h) + x))dx [ * [[(1 - F^t)))*. 



/0 ytl #k 
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Let us now consider the second term of (83) . Since in the interval [t 2 , £ 2 + A] only A k is 
active, in this interval 

„ U\ - { °"j(*2). 3 + k ' 

A) I CT k (h) + (t-t 2 ), j = k. 

Thus, 

i>t 2 +A n -A 

/ n(l-F,(a,(t)))^= n^-^tote))) / + (85) 

^ 3=1 m ^ 

Substituting (84) and (85) into (83), we obtain 

E u (a u . . . ,a n ) - E u (af, ...,<)> / (1 - F k (a k {h) + x))d. 

Jo 

h k (a k (h)) f * J](l " FjMt)))dt + U(l ~ F 3 (a 3 (t 2 ))) F^h))) 

ll 3+k i+k j^k 

The proof for the second scenario, where A k is suspended for A, is similar. For this 
scenario, the scheduling functions a k {t) and (Tj(t) for j / k can be represented as follows: 



x x 

(86) 



°m = { 



( <r k (t), t<h-A, 

a k (h - A) = a k (h) - A, h - A < t < t 2 - A, 



<7 fc (ti-A) + (t-(t 2 -A)) = <7 fc (ti) + (t-t 2 ), t 2 -A<t<t 2 , (87) 



, o-fc(t), t > t 2 ; 

( <Tj(t), t<h-A, 

<Tj(t + A), /; A - /< /, A. 



^■(ta), t 2 -A<t<t 2 , 

, <7j{t), t > t 2 . 

As before, these scheduling functions are continuous and satisfy invariant (39). 
Substituting a 1 into (6), we obtain 



X) = / Y[{l-F j (a j (t)))dt+ 
Jo j=1 

[ 2 (1 - F fc (a fc (t!) - A)) - Fj{aj{t + A)))dt+ 

/ (1 - F fc (<7 fc (*i) + (t - t 2 ))) [J(l - Fj(aj(t 2 )))dt + F^mdt = 

Jt 2 -A jjkk Jt 2 j=1 

/ II^ 1 " F il?M)to + IL 1 - ^fote))) / (1 - F k (a k (h) - x))dx+ 

J ° 3=1 j^k J ° 

pt2 POO n 

/ (1 - F k (a k (h) - A)) H(l - F,(a,(t)))dt + / [J(l - F>.,-(t)))dt. 
Jti j+k Jt 2 +A j=1 
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Subtracting E u (ai, . . . , a n ) given by (6) from E u (a\, . . . , a l n ), we get 
E u (a 1 , ...,a n )- E u (a\, . . .,a l n ) = 

A(l - F k (a k (t))) - (1 - F k (a k (h) - A))] J](l " F 3 (a 3 (t)))dt+ 
J * & k (89) 

t-h n rA 

/ H^ 1 - F Mi(t)))dt - f[(l - Fj(aj(h))) / (1 - F k (a k (h) - x))dx. 

As in the first scenario, in the interval [ii,^] 

(1 - F k (a k (t))) - (1 - F k (a k (h) - A)) = (f - F k (a k (h))) - (1 - F k (a k (h) - A)) = 

rO j-0 

/ d(l-F fc (<7 fc (ti)+x)) = - / fkMh) +x)dx = 
J-A J-A 

i-A [-A 
- f k (a k (h) - x)dx = - h k (a k (h) - x)(l - F k (a k (h) - x))dx. 
Jo Jo 

Due to monotonicity of h k (a k ) in ti, 

(1 - F k {a k {t))) - (1 - F k {a k {h) - A)) = 

/•A [-A 
- I h k (a k (h) - x){\ - F k (a k (h) - x))dx > -h k {a k {ti)) / (f - F k (a k (h) - x))dx, 
Jo Jo 



which leads to 

A(l - F k (a k (t))) - (1 - F k (a k (h) - A))] J](l " F 3 (a 3 (t)))dt > 

- h k {a k (h)) [ (1 - F k (a k (h) - x))dx [ * TT(1 - F 3 {a 3 {t)))dt. 
Jo Jh tJl 



( 9 °) 



The transformations for the second term of (89) are also similar to the previous scenario. 
Since in the interval [t\ — A, ti] only A k is active, in this interval 



a 3 (t) 

Thus, 



<7j(h), j^k, 
<7k(ti) - (h-t), j = k. 



[■t! n f-A 



Substituting (90) and (91) into (89), we obtain 

rA 



E u {a u ...,a n )- E u {a\,...,a l n )> [ (1 - F k (a k (h) - x))dx x 

JO 

r ri( i - ^(^w))* + il 1 - ^(^-(*2))) - il 1 - 

•'ti _-_£L. ;-Ll. 



(92) 
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By (86) and (92), 

sign(£ u (<Ti,...,<7„) - E u (af,...,a^)) = -sign(E u (<n, . . . , a n ) - E u (a{, . . . , <)), (93) 

and therefore one of these scenarios leads to better schedule, which contradicts the opti- 
mality of the original one. 

The proof for the case where control does not return to at all is exactly the same 
and is omitted here. Informally, it can be viewed as replacing t 2 by oo in all the formulas 
above, and the results are the same, same results. 
Q.E.D. 



A. 7 Proof of Theorem 6 

The claim of the theorem is as follows: 

// no time cost is taken into account (c = 1), the model with shared resources un- 
der intensity control settings is equivalent to the model with independent processes under 
suspend-resume control settings. Namely, given a suspend-resume solution for the model 
with independent processes, we may reconstruct an intensity-based solution with the same 
cost for the model with shared resources and vice versa. 

Proof: Let E* har . be the optimal value for the framework with shared resources, 
and E* ndependent be the optimal value for the framework with independent processes. Since 
c = 1, the two problems minimize the same expression 

r-oo / n \ n 

E u (a 1 ,...,a n )= 5>i l[0--F j (<7 j ))dt^imn, (94) 

Jo \i=i ) j=i 

and each set {ai} satisfying the resource sharing constraints automatically satisfies the 
process independence constraints, we obtain 



Let us prove that 



7?* ^ 

independent — shared' 



-^shared — -^independent- 



Assume that a set of functions ai, 02, . . . , a n is an optimal solution for the problem with 
independent processes, i.e., 

E u {(Ti, • • • , cr n ) = E independent . 

We want to construct a set of functions {<7j} satisfying the resource sharing constrains, such 
that 

E u (ai, ...,dn) = E u (ai, a n ). 
Let us consider a set of discontinuity points of a[ 

T = {t\3i:a' t (t-e)^a' t (t + e)}. 
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In our model this set is countable, and we can write it as a sorted sequence to = < t\ < 
... <tk < The expected schedule cost in this case will have a form 

oo 

E u (ai, ... ,a n ) = y^-E„.(ai, ... ,a n ), 

j=0 

where 

rt -L. i / n \ n 

E Uj (a 1 ,...,a n )= r {j^v'i ) - F^dt. 

Jt 3 \i = l J 1 = 1 

We want to construct the functions Oi incrementally. For each time interval [tj,tj + \] we 
define a corresponding point tj and a set of functions Oi, such that 

r*7+i ( n \ n 

E v .(ai,...,aZ)= I X 5 *' )Yl^-Fi(d r i))dt = E Uj (a 1 ,...a n ). 

\l=l / 1=1 

Let us denote aij = (Ji(tj) and Oij = cfi{tj). At the beginning, = for each i, and 
to = 0. Assume now that we have tj> defined for j' < j, and <7j(t) defined on each interval 
[tj',tji + i]. Let us show how to define tj and Oj on [tj,tj + i]. 

By definition of tj, k = Y2?=i a i(t) 1S a constant for t £ [tj, tj+i]- Since {a{\ satisfy 
suspend-resume constraints, exactly k > 1 processes are active in this interval, each with 
full intensity. Without loss of generality, the active processes are A\, Ai, . . . , A^, and 

E Uj (ai,...,a n ) = k [J(l-F,(a,))d* = 

fc (1 - Fi(aij)) / J^l - F ^ - l i + a lo))dt = 

l=k+l Jt i 1=1 

n ftj+i-tj n 

k n (1 - Fi(aij)) / - + <7y))ds. 

;=fc+i ^° i=i 



Let t J+ i = tj + k(tj + i — tj), and let us define <7j(t) on the segment [tj,tj + i] as follows: 

I otherwise. 
In this case, on this segment 

n 

!>'(*) = i, 

which means that the <7j satisfy the resource sharing constraints. By definition, 

tj+i - tj = k(t j+ i - tj), (96) 
and therefore for processes active on [tj,tj+i] we obtain 



135 



FlNKELSTEIN, MARKOVITCH & RlVLIN 



For processes idle on [tj,tj + i] the same equality holds as well: 

°~i,j+l ~ = = (Tij+l — (Tij, 

and since ai(t) = we obtain the invariant 

&ij = (Tij. (97) 
The average cost for the new schedules may be represented as 

a£) = l£aAl[(l-F l (a l ))dt = 

Jt i \l=l J 1=1 
n ftj+i k _ 

[] (1 - Fi(dy)) H(l - Fi((t - tj)/k + afj))dt. 

l=k+l Jt i 1=1 



Substituting x = (t — tj)/k and using (95), (96) and (97), we obtain 

*(tj +1 -tj)/k 

(1-F,(S£)) / 

l=k+l 

/ II ( I b)(ar,))dt V III / • + (ti ,))<!■'■ 



n /-(t t )/fe 

E^(ft,...,ad=k J] (1-F,(S£)) / '' ' JJ(l-F,(s + 5^))dx = 
/=fc+i ^° /=i 

™ rt j+1 -tj k 

H (1 - Fi{aij))dt / " ^ + ^')) da 



^(fi, • • • >cm)- 

From the last equation, it immediately follows that 

oo oo 
E u ((Tl, . . . ,<J n ) = F Uj ((Tl, ■ ■ ■ ,<T n ) = y~]-E M ,(<7l, • • • , <T n ) = E u ((Tl, . . . ,CJ n ), 

3=0 3=0 

which completes the proof. 
Q.E.D. 
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